-
Notifications
You must be signed in to change notification settings - Fork 505
fix: allow single-valued controlled vocabulary fields in Solr schema #11320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This was changed in #8601 to support indexing values in multiple languages. I think that's something we'd want to keep. Is there some other way to accomplish what you want with collapsing? |
Ok I see! That makes sense. Hmm, I have two ideas:
|
I haven't tested but I think either of those would work, though both have ~minor drawbacks (the first means you have to change the schema if you turn i18n on/off, the latter means more fields). What's the use case for the collapse query you're doing? (Is it something multiple institutions will want to do and that doesn't have an alternative?) |
Yes, I see those drawbacks as well. The latter drawback (more fields) sounds preferable to me. It might work to go that way, and possibly additionally only produce those extra fields if a feature flag is set? This issue arose because we are doing work on our UI related to the improved "related datasets" prototype. For each of our search results, we'd like to show related items and group them by their type. The type is captured in a single-valued CVOC field, which we can't group by unless it is indexed in a non-multiValued field. This use case is currently somewhat specific to our instance, but it's possible that others would also want to group search results by a single-valued CVOC field. |
To be reviewed in Tech Hours |
@vera - we had a tech hour discussion of internationalization and possible ways we could improve i18n support that would also avoid making these fields multivalue. We didn't come up with anything small enough to think about in the short term though. In the call we thought that option 1 above - adding a check to see if multiple languages are in use would be preferable, but now, I think that would still leave an issue with some external cvoc scripts - ones that store the identifier and the human readable form in the solr field. That's controlled by the retrieval filtering in the config and so Dataverse could check to see if that was being done and use single/multi value as needed, but if you need to use collapse on a field where such an external script is in use, it wouldn't work. (To be clear - I think code code be written so that the single/multivalue is always correctly set for what the metadatablock, i18n, and external scripts need. The problem would just be if you needed to do collapse on a field that was multi and you didn't want to change the external vocab script to allow it to be single value.) If that still works for your use case(s) I think we're OK with a PR for it (I can help with figuring out the retrieval query logic to set such fields to single/multi based on the CVocConf). We didn't think just adding a second field would be something we'd want to maintain in the main repository unless/until there are some additional use cases from other instances. I think if you want to go this route, we'd suggest maintaining this feature as a fork for now. (If your overall feature for linking datasets is something that gets into the main repo, we could look at adding the extra field(s) required as part of that PR.) We also had some discussion of whether facets or grouping could be used to do what you wanted without requiring a single value field, or whether post-processing the solr result to group results would be usable (enough performance given the hopefully small list of items for a given dataset - assuming your queries were all for one dataset at a time). I don't think we know enough about solr or your use case to know if these or other solr features would be viable though. (We were confused a bit by the 3d* fields you listed above - we were guessing that you weren't trying to use collapse on those and that they were just other examples of fields that are single cvv with i18n making them multiple solr fields.) I hope that gives you a way forward. Let us know if you want to close this PR or keep it open for you to make changes. |
What this PR does / why we need it:
A controlled vocabulary metadata field in our custom metadata blocks was incorrectly marked
multiValued
in the Solr schema. This caused errors when performing Solr queries with grouping on that field.Example:
https://dataverse.harvard.edu/api/search?&type=dataset&q=*:*&fq={!collapse%20field=%27journalArticleType%27}
leads to
"Search Syntax Error: Error from server at http://dvn-cloud-solr.lib.harvard.edu:8983/solr/collection1: org.apache.solr.search.SyntaxError: Collapsing not supported on multivalued fields"
.Previously, controlled vocabulary fields were always set as
multiValued="true"
in the Solr schema, even when neither they nor their parent fields were declared to be multivalued in the TSV file. I'm not sure why. If there is a reason for this, I would be interested to know.This affects the following fields in the standard metadata blocks, which will now be singlevalued within the Solr schema:
3d3DTechnique
3dExportedFileFormat
3dLightingSetup
3dUnit
journalArticleType
Which issue(s) this PR closes:
/
Special notes for your reviewer:
/
Suggestions on how to test this:
Use the updated
schema.xml
to index datasets and verify correct behavior.Does this PR introduce a user interface change? If mockups are available, please link/include them here:
/
Is there a release notes update needed for this change?:
/
Additional documentation:
/