EOL Darwin Core Archive #731

mo-nathan · 2021-01-02T00:35:09Z

Implements the taxon based Darwin Core Archive (DwCA) that EOL expects. Involves a bit of renaming and refactoring to differentiate between the two forms of DwCA.

To test:

Go to a page where you can download a report (e.g., the Download link from a Species List page)
Select the 'Taxon Darwin Core Archive for EOL' option
Download and checkout the resulting zip file.
Note that this is not expected to pass through the GBIF validator since it has no occurrence data.

JoeCohen · 2021-01-03T04:28:29Z

I ran the manual test; unzipped the downloaded file, resulting in 3 well-formed files.
I don't understand the intended use of the multimedia.csv. In particular, how does one relate the images to the taxa?

mo-nathan · 2021-01-04T01:56:39Z

@JoeCohen The first column of the multimedia.csv should match the first column of the taxa.csv file. So if you look at a line in multimedia.csv you should be able to figure out which taxon it is associated with. Similarly if you search for an id from taxa.csv you should find all the images for that taxon in multimedia.csv (as URLs of course).

JoeCohen · 2021-01-04T03:54:49Z

Thanks @mo-nathan. I wasn't looking far enough down the first column of the multimedia file; I saw a bunch of "1"'s and stopped looking.
Another question:
Which images is it picking? (I exported my observations, but it's not using all of my images for those Observations.)

mo-nathan · 2021-01-09T20:11:13Z

Here's the feedback from Jen Hammock @ EOL:

"Looks to me like you're almost there. You do need a couple of things, mostly in your meta, a couple in your media file.

The taxonID column in the taxa file should go to http://rs.tdwg.org/dwc/terms/taxonID

And the first four columns in your media file should go to

http://purl.org/dc/terms/identifier
http://purl.org/dc/terms/type
http://purl.org/dc/terms/format
http://rs.tdwg.org/ac/terms/accessURI

And the values in the http://purl.org/dc/terms/identifier column should be unique in the file. You can re-use the values from the http://rs.tdwg.org/ac/terms/accessURI column for this if you like.

Finally, your http://rs.tdwg.org/dwc/terms/taxonID column should also appear in the media file, to provide the taxon mappings. Does that make sense?"

I sent her another zip file based on the new code changes and will update this PR when I hear back.

…etter optimized and refined

… some of waster memory the old way used to generate

codeclimate · 2021-01-27T04:42:59Z

app/classes/report/darwin/gbif_images.rb

+        query.project(attribute(:images, :id),
+                      attribute(:images, :when),
+                      attribute(:images, :copyright_holder),
+


Empty line detected around arguments.

codeclimate · 2021-01-27T04:43:00Z

app/classes/report/darwin/gbif_images.rb

+                      attribute(:observations, :long),
+                      attribute(:observations, :alt),
+                      attribute(:observations, :notes),
+


Empty line detected around arguments.

codeclimate · 2021-01-27T04:43:00Z

app/classes/report/darwin/gbif_images.rb

+                      attribute(:names, :text_name),
+                      attribute(:names, :author),
+                      attribute(:names, :rank),
+


Empty line detected around arguments.

codeclimate · 2021-01-27T04:43:00Z

app/classes/report/darwin/gbif_images.rb

+                      attribute(:locations, :west),
+                      attribute(:locations, :high),
+                      attribute(:locations, :low),
+


Empty line detected around arguments.

codeclimate · 2021-01-27T04:43:00Z

app/classes/report/darwin/gbif_images.rb

+
+                      attribute(:users, :name),
+                      attribute(:users, :login),
+


Empty line detected around arguments.

…r.org

codeclimate · 2021-03-13T17:22:42Z

app/classes/report/darwin/eol_taxa.rb

+            family = extract_name(level) if level.start_with?("Family")
+            return [family, kingdom] if family && kingdom
+          end
+          return ["", kingdom] if kingdom


Add empty line after guard clause.

codeclimate · 2021-03-13T17:22:42Z

app/classes/report/darwin/eol_taxa.rb

+
+      private
+
+        def parse_genus(name)


Inconsistent indentation detected.

codeclimate · 2021-03-13T17:22:42Z

app/classes/report/darwin/eol_taxa.rb

+          name.split(' ')[0]
+        end
+
+        def higher_taxa(row)


Inconsistent indentation detected.

codeclimate · 2021-03-13T17:22:42Z

app/classes/report/darwin/eol_taxa.rb

+          result
+        end
+
+        def parse_classification(classification)


Inconsistent indentation detected.

codeclimate · 2021-03-13T17:22:42Z

app/classes/report/darwin/eol_taxa.rb

+          nil
+        end
+
+        def extract_name(level)


Inconsistent indentation detected.

coveralls · 2021-03-13T17:29:49Z

coverage: 94.426% (-0.1%) from 94.523%
when pulling 4894c73 on eol-dwca
into 0e94b1f on main.

codeclimate · 2021-06-20T19:03:00Z

app/classes/report/darwin/eol_taxa.rb

+          level.split(": _")[1].chomp("_")
+        end
+
+        def genus_classification(genus)


Inconsistent indentation detected.

codeclimate · 2021-06-20T19:03:00Z

app/classes/report/darwin/eol_taxa.rb

+
+      private
+
+        def parse_genus(name)


Use 2 (not 4) spaces for indentation.

codeclimate · 2021-06-20T19:03:00Z

app/classes/report/darwin/eol_taxa.rb

+          name.split(' ')[0]
+        end
+
+        def higher_taxa(row)


Use 2 (not 4) spaces for indentation.

codeclimate · 2021-06-20T19:03:00Z

app/classes/report/darwin/eol_taxa.rb

+          result
+        end
+
+        def parse_classification(classification)


Use 2 (not 4) spaces for indentation.

codeclimate · 2021-06-20T19:03:00Z

app/classes/report/darwin/eol_taxa.rb

+          nil
+        end
+
+        def extract_name(level)


Use 2 (not 4) spaces for indentation.

codeclimate · 2022-01-29T15:38:38Z

app/classes/report/darwin/eol_taxa.rb

+          level.split(": _")[1].chomp("_")
+        end
+
+        def genus_classification(genus)


Use 2 (not 4) spaces for indentation.

codeclimate · 2022-01-29T15:38:38Z

app/classes/report/darwin/eol_taxa.rb

+      private
+
+        def parse_genus(name)
+          name.split(' ')[0]


Prefer double-quoted strings unless you need single quotes to avoid extra backslashes for escaping.

codeclimate · 2022-11-07T00:58:09Z

app/classes/report/darwin/eol_taxa.rb

+      private
+
+        def parse_genus(name)
+          name.split(' ')[0]


Argument ' ' is redundant because it is implied by default.

codeclimate · 2022-11-07T00:58:09Z

app/classes/report/eol.rb

+    # generate CSV & meta.xml and bundle into a Zip
+    def render
+      filename = "#{::Rails.root}/public/dwca/eol_meta.xml"
+      content << ["meta.xml", File.open(filename).read]


Use File.read.

codeclimate · 2022-11-07T00:58:09Z

app/classes/report/gbif.rb

+    # generate CSV & meta.xml and bundle into a Zip
+    def render
+      filename = "#{::Rails.root}/public/dwca/gbif_meta.xml"
+      content << ["meta.xml", File.open(filename).read]


Use File.read.

codeclimate · 2022-11-07T00:58:09Z

app/classes/report/row_extensions.rb

-        locality = locality.blank? ? county : "#{county}, #{locality}"
-        county = nil
+      @country, @state, @county, @locality = val.split(", ", 4)
+      if @county && [email protected]!(/ (Co\.|Parish)$/, "")


Use a guard clause (return unless @county && [email protected]!(/ (Co\.|Parish)$/, "")) instead of wrapping the code inside a conditional expression.

codeclimate · 2022-11-07T00:58:12Z

Code Climate has analyzed commit 183e8e7 and detected 16 issues on this pull request.

Here's the issue category breakdown:

Category	Count
Style	16

View more on Code Climate.

mo-nathan · 2024-02-17T21:33:46Z

This PR also includes changes to support the GBIF Darwin Core Archive. However, there are some issues with the resulting ZIP file.

The new eml.xml file. I don't remember this being required the last time I was really playing with this code, but it now appears to be required to pass the GBIF validator (https://www.gbif.org/tools/data-validator). It has some dynamic info (dates) that should be consistent with when the document archive is created. Right now I just dropped a hard coded version of the file to get the GBIF validator to pass.
I created an output file when running the code locally. This result is here: https://mushroomobserver.org/gbif.zip. After adding the eml.xml file mentioned above, I created another archive that is here: https://mushroomobserver.org/valid-gbif.zip. This version technically passes the validator, but it has a bunch of warnings that should be reviewed and potentially fixed. The current results of running this file through the validator are:

The file can be indexed by GBIF
Some issues were detected by the validator:
Resource Structure | The EML document does not validate against the schema
Metadata Content |
The description of the dataset is missing or too short
The resource creator is missing or is incomplete

Warning	Count
Modified date invalid	97709
Multimedia date invalid	97709
Continent derived from coordinates	97524
Taxon match higherrank	1500
Taxon match none	336
Country coordinate mismatch	213
Presumed negated longitude	205
Country derived from coordinates	184
Country invalid	113
Zero coordinate	28
Presumed negated latitude	17
Presumed swapped coordinate	1
Geodetic datum assumed WGS84	97709

These sould be reviewed and cleaned up where reasonable.

mo-nathan added 10 commits January 1, 2021 10:42

Add EOL report with taxa.csv

d2b2906

Add EOL report

6c1a9e2

Make Rubocop happy

a71c88c

Rename Darwin::Images to Darwin::ObservationImages

1bc89c0

Rename Report::Dwca to Report::Gbif

6845eb9

Change "darwin" to "gbif" where appropriate

028e472

Rename GBIF download file from dwca.zip to gbif.zip

f9e1519

Add test for Report::Eol

ca52edc

Remove dead code

011d366

Add eol_meta.xml

9e9730e

Updates to the eol.zip based on feedback from Jen Hammock from EOL

a7a914c

mo-nathan added 5 commits January 16, 2021 13:19

Generalize EOL report to apply to all MO taxa

5a95772

Add name constraints to EOL DwCA

0d3247a

Latest updated from Jen @ EOL

18d84de

Create DwCA format that is distinct from GBIF so GBIF export can be b…

1422217

…etter optimized and refined

Added explicit GBIF report type and did some optimizing to get rid of…

70b1ea5

… some of waster memory the old way used to generate

codeclimate bot reviewed Jan 27, 2021

View reviewed changes

mo-nathan added 8 commits January 27, 2021 21:16

Rubocop fixes

84c415c

Rubocop fixes

687b846

Rename TaxonImages to EolImages

5a1c502

Rename ImageTaxa to EolTaxa

d314b0c

Fix issue in parents

c0cb413

Fix rank string issue

66b61e8

Fix rank bleed between rows

3e24b75

Allow all taxa at or below Genus and include classification information

757161e

mo-nathan marked this pull request as draft February 8, 2021 01:05

mo-nathan added 4 commits February 28, 2021 16:47

Initial genus lookup cache

2c50196

Merge branch 'master' into eol-dwca

61b292b

Remove taxonID from GBIF multimedia.csv

09ce233

Make OccurrenceID in GBIF observations.csv always use mushroomobserve…

65cb56e

…r.org

codeclimate bot reviewed Mar 13, 2021

View reviewed changes

Merge branch 'master' into eol-dwca

bd940e6

codeclimate bot reviewed Jun 20, 2021

View reviewed changes

Merge branch 'master' into eol-dwca

e7af552

codeclimate bot reviewed Jan 29, 2022

View reviewed changes

mo-nathan added 3 commits August 10, 2022 20:03

Merge branch 'master' into eol-dwca

f10a367

Fix failing specs

0454ce7

Merge branch 'master' into eol-dwca

183e8e7

codeclimate bot reviewed Nov 7, 2022

View reviewed changes

mo-nathan added 3 commits October 21, 2023 17:09

Merge branch 'main' into eol-dwca

6c77ad5

A bit of cleanup on the observations download page

70614d6

Merge branch 'main' into eol-dwca

d16df4f

Add example eml.xml file

4894c73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOL Darwin Core Archive #731

EOL Darwin Core Archive #731

mo-nathan commented Jan 2, 2021

JoeCohen commented Jan 3, 2021

mo-nathan commented Jan 4, 2021

JoeCohen commented Jan 4, 2021

mo-nathan commented Jan 9, 2021

codeclimate bot Jan 27, 2021

codeclimate bot Jan 27, 2021

codeclimate bot Jan 27, 2021

codeclimate bot Jan 27, 2021

codeclimate bot Jan 27, 2021

codeclimate bot Mar 13, 2021

codeclimate bot Mar 13, 2021

codeclimate bot Mar 13, 2021

codeclimate bot Mar 13, 2021

codeclimate bot Mar 13, 2021

coveralls commented Mar 13, 2021 •

edited

Loading

codeclimate bot Jun 20, 2021

codeclimate bot Jun 20, 2021

codeclimate bot Jun 20, 2021

codeclimate bot Jun 20, 2021

codeclimate bot Jun 20, 2021

codeclimate bot Jan 29, 2022

codeclimate bot Jan 29, 2022

codeclimate bot Nov 7, 2022

codeclimate bot Nov 7, 2022

codeclimate bot Nov 7, 2022

codeclimate bot Nov 7, 2022

codeclimate bot commented Nov 7, 2022

mo-nathan commented Feb 17, 2024 •

edited

Loading

EOL Darwin Core Archive #731

Are you sure you want to change the base?

EOL Darwin Core Archive #731

Conversation

mo-nathan commented Jan 2, 2021

JoeCohen commented Jan 3, 2021

mo-nathan commented Jan 4, 2021

JoeCohen commented Jan 4, 2021

mo-nathan commented Jan 9, 2021

codeclimate bot Jan 27, 2021

Choose a reason for hiding this comment

codeclimate bot Jan 27, 2021

Choose a reason for hiding this comment

codeclimate bot Jan 27, 2021

Choose a reason for hiding this comment

codeclimate bot Jan 27, 2021

Choose a reason for hiding this comment

codeclimate bot Jan 27, 2021

Choose a reason for hiding this comment

codeclimate bot Mar 13, 2021

Choose a reason for hiding this comment

codeclimate bot Mar 13, 2021

Choose a reason for hiding this comment

codeclimate bot Mar 13, 2021

Choose a reason for hiding this comment

codeclimate bot Mar 13, 2021

Choose a reason for hiding this comment

codeclimate bot Mar 13, 2021

Choose a reason for hiding this comment

coveralls commented Mar 13, 2021 • edited Loading

codeclimate bot Jun 20, 2021

Choose a reason for hiding this comment

codeclimate bot Jun 20, 2021

Choose a reason for hiding this comment

codeclimate bot Jun 20, 2021

Choose a reason for hiding this comment

codeclimate bot Jun 20, 2021

Choose a reason for hiding this comment

codeclimate bot Jun 20, 2021

Choose a reason for hiding this comment

codeclimate bot Jan 29, 2022

Choose a reason for hiding this comment

codeclimate bot Jan 29, 2022

Choose a reason for hiding this comment

codeclimate bot Nov 7, 2022

Choose a reason for hiding this comment

codeclimate bot Nov 7, 2022

Choose a reason for hiding this comment

codeclimate bot Nov 7, 2022

Choose a reason for hiding this comment

codeclimate bot Nov 7, 2022

Choose a reason for hiding this comment

codeclimate bot commented Nov 7, 2022

mo-nathan commented Feb 17, 2024 • edited Loading

coveralls commented Mar 13, 2021 •

edited

Loading

mo-nathan commented Feb 17, 2024 •

edited

Loading