-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching on DwC field datasetName #3006
Comments
I guess this is similar to the request made in #3026 Here's some more info about our particular use case: We need to group some records in multiple datasets together, and then download them. If I do a free text search like https://www.gbif.org/occurrence/search?q=Artsprosjekt_55-12_PolyNor for "Artsprosjekt_55-12_PolyNor" (one of the project names) then I get the correct records, but no way to download them. We've been publishing the project name for each record under 'datasetName' (e.g. https://www.gbif.org/occurrence/3436305215). This functionality is necessary for our researchers. If there's no planned work on this maybe @MortenHofft you have another idea for an additional field we can add the datasetName to which is searchable, as a work-around? |
I cannot really think if any beyond those mentioned in the referenced issue. Which is essentially using publisher, institution, eventIds and collection (when appropriate of course - they shouldn't be misused just to group records that isn't in fact a collection). Or split into multiple datasets. I cannot think of any other way to group records across or within datasets. Perhaps others can think of another approach? @ManonGros? If not and there is a strong request, then the danger is that we will see bad data that misuse e.g. collectionCode as a hack to achieve what is needed. That would be a shame. |
Is projectID the projectID from the EML? Pity it's not on a record level... I suppose one could argue that the grouped datasets are all part of a 'collection', kind of? Is collectionCode that bad a hack do you think? I see in the definition it says 'identifying the collection or data set from which the record was derived'. |
I cannot think of any alternative (other than the ones listed in the other issue). The collection code hack isn't ideal, especially in the context of specimen records (for observations it would make more sense). Yes the projectID is from the EML so it is for all the records in a given dataset. This is the same problem as the networks (they include whole datasets). I suppose we could:
@ahahn-gbif do you have any input on the topic? (the question is how to aggregate/download records that are part of several datasets) |
Should it be possible to be part of multiple "projects/datasets" |
ProjectID in GBIF (and EML metadata) is presently given preference for projects run by or through GBIF (BID, BIFA, CESP and friends). The term is not (to my knowledge) defined again at record level in Darwin Core., so that the limitation is, as recognized, that a) a projectID is applied at dataset level, and that b) not more than one projectID can be assigned to the dataset. In that sense, I would advise against that choice. Overloading any DwC term to find a work-around for some practical need is not a good idea. https://dwc.tdwg.org/terms/#dwc:datasetName is defined as "The name identifying the data set from which the record was derived.". If that is factually correct in the data, then we would not want to encourage using other terms against their actual definition. If there is a recognized need in the community to be able to search this term through the user interface, this may be a change request. It is quite possibly not a wide-spread user demand, so that my question would be how often it is used (yearly reporting? regularly?), and by which kind of "customers". Is it possibly more an API access option that would satisfy this need? |
I would actually think this is quite a common scenario, and that there are many field projects which go out on yearly collection trips, taking specimens which go into several collections. And then of course it's necessary for the individual projects to be able to see only their specimens. |
My 2-penneth is that it's definitely common at my institution to want to do this kind of thing and it's not easy to do right now. There are a few things that we use the datasetName field for. Usually it is something with funding but not always: The name of a digitization project Users have asked us how to retrieve the data associated with one or more of the above. Sometimes it's to show funders that a goal was achieved either in a single institution or across multiple institutions or we would like to be able to include/reference gbif datasets for a particular datasetName on our our web properties. The example I give here is to our Rapid Inventories project that has been going for decades. They would like to be able to retrieve everything from a given expedition and the records cut not only across institutions but also across taxa. Maybe this is tied up with events, I dunno but if it is we still need something simple for users and providers to work with. I'll show my ignorance but is there a place to mint id's for projects/expeditions? If there is great, if not we are stuck with datasetName. Our CMS allows us to record multiple projects per occurrence. |
I think @albenson-usgs also mentioned the need for aggregating specific occurrences across datasets. If I remember correctly, the collectionCode was/is used for that purpose. |
collectionCode is a problem for us to use in this regard because it is used at a much higher level. For example to distinguish between the "Bird" collection and the "Fossil Herps" collection. These values are also unitary. |
We've just had a request for this: gbif-norway/helpdesk#90 |
Please see gbif/pipelines#662 where we intend to implement multivalue dataset ID and name search capabilities shortly. |
Hi,
how do I search for occurrences with a particular value in the datasetName field?
I can see the data in the record page but don't know which field it is in the advanced search list.
![image](https://user-images.githubusercontent.com/702709/93545824-06b83e00-f927-11ea-94c2-384f0c091f34.png)
If that field isn't available to search on it would be really helpful if it could be added
Thanks
Sharon
The text was updated successfully, but these errors were encountered: