Searching on DwC field datasetName #3006

rondlg · 2020-09-18T01:49:23Z

Hi,
how do I search for occurrences with a particular value in the datasetName field?

I can see the data in the record page but don't know which field it is in the advanced search list.

If that field isn't available to search on it would be really helpful if it could be added

Thanks

Sharon

rukayaj · 2022-01-27T08:24:14Z

I guess this is similar to the request made in #3026

Here's some more info about our particular use case: We need to group some records in multiple datasets together, and then download them. If I do a free text search like https://www.gbif.org/occurrence/search?q=Artsprosjekt_55-12_PolyNor for "Artsprosjekt_55-12_PolyNor" (one of the project names) then I get the correct records, but no way to download them. We've been publishing the project name for each record under 'datasetName' (e.g. https://www.gbif.org/occurrence/3436305215).

This functionality is necessary for our researchers. If there's no planned work on this maybe @MortenHofft you have another idea for an additional field we can add the datasetName to which is searchable, as a work-around?

MortenHofft · 2022-01-27T09:22:06Z

I cannot really think if any beyond those mentioned in the referenced issue.

Which is essentially using publisher, institution, eventIds and collection (when appropriate of course - they shouldn't be misused just to group records that isn't in fact a collection). Or split into multiple datasets. I cannot think of any other way to group records across or within datasets.

Perhaps others can think of another approach? @ManonGros? If not and there is a strong request, then the danger is that we will see bad data that misuse e.g. collectionCode as a hack to achieve what is needed. That would be a shame.

rukayaj · 2022-01-27T09:40:05Z

Is projectID the projectID from the EML? Pity it's not on a record level... I suppose one could argue that the grouped datasets are all part of a 'collection', kind of? Is collectionCode that bad a hack do you think? I see in the definition it says 'identifying the collection or data set from which the record was derived'.

ManonGros · 2022-01-27T09:57:17Z

I cannot think of any alternative (other than the ones listed in the other issue). The collection code hack isn't ideal, especially in the context of specimen records (for observations it would make more sense).

Yes the projectID is from the EML so it is for all the records in a given dataset. This is the same problem as the networks (they include whole datasets). I suppose we could:

investigate whether projectID or networks could be at the record level (although this wasn't their intended purpose and it might be difficult to do)
or consider making the datasetName field searchable (that might be better)
or have/use a new term (I am not sure about that).

@ahahn-gbif do you have any input on the topic? (the question is how to aggregate/download records that are part of several datasets)

MortenHofft · 2022-01-27T10:05:02Z

Should it be possible to be part of multiple "projects/datasets"

ahahn-gbif · 2022-01-27T10:21:02Z

ProjectID in GBIF (and EML metadata) is presently given preference for projects run by or through GBIF (BID, BIFA, CESP and friends). The term is not (to my knowledge) defined again at record level in Darwin Core., so that the limitation is, as recognized, that a) a projectID is applied at dataset level, and that b) not more than one projectID can be assigned to the dataset. In that sense, I would advise against that choice.

Overloading any DwC term to find a work-around for some practical need is not a good idea. https://dwc.tdwg.org/terms/#dwc:datasetName is defined as "The name identifying the data set from which the record was derived.". If that is factually correct in the data, then we would not want to encourage using other terms against their actual definition.

If there is a recognized need in the community to be able to search this term through the user interface, this may be a change request. It is quite possibly not a wide-spread user demand, so that my question would be how often it is used (yearly reporting? regularly?), and by which kind of "customers". Is it possibly more an API access option that would satisfy this need?

rukayaj · 2022-01-28T10:58:13Z

I would actually think this is quite a common scenario, and that there are many field projects which go out on yearly collection trips, taking specimens which go into several collections. And then of course it's necessary for the individual projects to be able to see only their specimens.

rondlg · 2022-01-28T18:23:14Z

My 2-penneth is that it's definitely common at my institution to want to do this kind of thing and it's not easy to do right now.

There are a few things that we use the datasetName field for. Usually it is something with funding but not always:

The name of a digitization project
An expedition
A Research Project
A Lab
etc. etc.

Users have asked us how to retrieve the data associated with one or more of the above. Sometimes it's to show funders that a goal was achieved either in a single institution or across multiple institutions or we would like to be able to include/reference gbif datasets for a particular datasetName on our our web properties.

The example I give here is to our Rapid Inventories project that has been going for decades. They would like to be able to retrieve everything from a given expedition and the records cut not only across institutions but also across taxa.

Maybe this is tied up with events, I dunno but if it is we still need something simple for users and providers to work with.

I'll show my ignorance but is there a place to mint id's for projects/expeditions? If there is great, if not we are stuck with datasetName.

Our CMS allows us to record multiple projects per occurrence.

ManonGros · 2022-02-03T09:22:50Z

I think @albenson-usgs also mentioned the need for aggregating specific occurrences across datasets. If I remember correctly, the collectionCode was/is used for that purpose.

rondlg · 2022-02-03T16:19:19Z

collectionCode is a problem for us to use in this regard because it is used at a much higher level. For example to distinguish between the "Bird" collection and the "Fossil Herps" collection. These values are also unitary.

rukayaj · 2022-02-08T09:57:45Z

Should it be possible to be part of multiple "projects/datasets"

We've just had a request for this: gbif-norway/helpdesk#90

timrobertson100 · 2022-02-09T10:32:19Z

Please see gbif/pipelines#662 where we intend to implement multivalue dataset ID and name search capabilities shortly.

MortenHofft added api idea labels Sep 23, 2020

dagendresen mentioned this issue Jan 26, 2022

Structured search on datasetname term so we can group and download records gbif-norway/helpdesk#88

Closed

MortenHofft mentioned this issue Feb 8, 2022

Add search support for datasetName and datasetID gbif/pipelines#662

Closed

MortenHofft mentioned this issue Feb 20, 2024

can i filter the data in such a way that i can see how many projects dealt with a certain species in 2023 in germany, e.g. how many projects dealt with the recording of amphibians in 2023 in germany? #5194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching on DwC field datasetName #3006

Searching on DwC field datasetName #3006

rondlg commented Sep 18, 2020

rukayaj commented Jan 27, 2022

MortenHofft commented Jan 27, 2022 •

edited

Loading

rukayaj commented Jan 27, 2022

ManonGros commented Jan 27, 2022

MortenHofft commented Jan 27, 2022

ahahn-gbif commented Jan 27, 2022

rukayaj commented Jan 28, 2022

rondlg commented Jan 28, 2022 •

edited

Loading

ManonGros commented Feb 3, 2022

rondlg commented Feb 3, 2022 •

edited

Loading

rukayaj commented Feb 8, 2022

timrobertson100 commented Feb 9, 2022

Searching on DwC field datasetName #3006

Searching on DwC field datasetName #3006

Comments

rondlg commented Sep 18, 2020

rukayaj commented Jan 27, 2022

MortenHofft commented Jan 27, 2022 • edited Loading

rukayaj commented Jan 27, 2022

ManonGros commented Jan 27, 2022

MortenHofft commented Jan 27, 2022

ahahn-gbif commented Jan 27, 2022

rukayaj commented Jan 28, 2022

rondlg commented Jan 28, 2022 • edited Loading

ManonGros commented Feb 3, 2022

rondlg commented Feb 3, 2022 • edited Loading

rukayaj commented Feb 8, 2022

timrobertson100 commented Feb 9, 2022

MortenHofft commented Jan 27, 2022 •

edited

Loading

rondlg commented Jan 28, 2022 •

edited

Loading

rondlg commented Feb 3, 2022 •

edited

Loading