Description
I wanted to summarise one of the outcome of today's call and clarify a comment made in the PRC feedback document:
I suggest that the specification would mandate that at least one 'required' attribute is used but would not define which one it is. Over time, Refget Collections specification would create recommendations for the use of 'required' fields for different domains e.g. for genome archives this might be to have TWO different Refget Collections digests: 'names, lengths, sequences' and 'accessions, lengths, sequences'.
In ADR from 2023-07-12, we decided that the only mandatory properties in a sequence collections would be lengths
and names
.
The argument made today was that by requiring lengths and names to be present, we're potentially forcing these attributes in use cases where they are not relevant or in some case not available. The example given was that of a CRAM file that contains a digest for each sequence but does not contains the length.
The argument in favour of having required field is one of interoperability. Guaranteeing the presence of the two fields helps making different services compatible by always having common grounds.
Reading back the ADR from 2023-07-12, the rational does not feel about how lengths
and names
should be made mandatory but how sequences
should not be made mandatory because it would have prevented the use case of coordinate space to be implemented. I think similar argument can be made about other use-cases we might not have envisioned.
@raskoleinonen, please correct me if I misrepresented your point
@nsheff @sveinugu @andrewyatz please chime in as any change would have to be made relatively soon.