Should lengths and names be required properties in every sequence collection ?

I wanted to summarise one of the outcome of today's call and clarify a comment made in the PRC feedback document: 
> I suggest that the specification would mandate that at least one 'required' attribute is used but would not define which one it is. Over time, Refget Collections specification would create recommendations for the use of 'required' fields for different domains e.g. for genome archives this might be to have TWO different Refget Collections digests: 'names, lengths, sequences' and 'accessions, lengths, sequences'.

In [ADR from 2023-07-12](https://github.com/ga4gh/seqcol-spec/blob/master/docs/decision_record.md#2023-07-12---required-attributes-are-lengths-and-names), we decided that the only mandatory properties in a sequence collections would be `lengths` and `names`. 
The argument made today was that by requiring lengths and names to be present, we're potentially forcing these attributes in use cases where they are not relevant or in some case not available. The example given was that of a CRAM file that contains a digest for each sequence but does not contains the length. 

The argument in favour of having required field is one of interoperability. Guaranteeing the presence of the two fields helps making different services compatible by always having common grounds.

Reading back the [ADR from 2023-07-12](https://github.com/ga4gh/seqcol-spec/blob/master/docs/decision_record.md#2023-07-12---required-attributes-are-lengths-and-names), the rational does not feel about how `lengths` and `names` should be made mandatory but how `sequences` should not be made mandatory because it would have prevented the use case of coordinate space to be implemented. I think similar argument can be made about other use-cases we might not have envisioned. 

@raskoleinonen, please correct me if I misrepresented your point

@nsheff @sveinugu @andrewyatz please chime in as any change would have to be made relatively soon.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should lengths and names be required properties in every sequence collection ? #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should lengths and names be required properties in every sequence collection ? #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions