You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Refget sequences: a GA4GH-approved standard for individual sequences
14
-
2. Refget sequence collections: a standard for collections of sequences, under review
15
13
16
14
## What is the refget sequences standard?
17
15
18
-
The original refget handled sequences only. Refget enables access to reference sequences using an identifier derived from the sequence itself.
16
+
The original refget standard, now called *Refget sequences*, handles sequences only.
17
+
Refget sequences enables access to reference sequences using an identifier derived from the sequence itself.
18
+
19
19
20
20
## What is the refget sequence collections standard?
21
21
22
-
*Sequence Collections*, or `seqcol` for short, standardizes unique identifiers for collections of sequences. Seqcol identifiers can be used to identify genomes, transcriptomes, or proteomes -- anything that can be represented as a collection of sequences. The seqcol protocol provides:
22
+
*Refget sequence collections*, or `seqcol` for short, standardizes unique identifiers for collections of sequences. Seqcol identifiers can be used to identify genomes, transcriptomes, or proteomes -- anything that can be represented as a collection of sequences. The seqcol protocol provides:
23
23
24
24
- implementations of an algorithm for computing sequence identifiers;
25
25
- a lookup service to retrieve sequences given a seqcol identifier
@@ -8,6 +8,82 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S
8
8
9
9
[TOC]
10
10
11
+
## 2024-11-20 Level 2 return values should not return transient attributes
12
+
13
+
### Decision
14
+
15
+
Level 2 return values should not return transient attributes
16
+
17
+
### Rationale
18
+
19
+
We debated whether the `/collection?level=2` endpoint should do with transient attributes, because the level 2 representations are not stored. One train of thought was that it could return the level 1 representation; other is that it just includes nothing. We decided that the more pure approach would be include neither
20
+
21
+
Another option was something like `?level=highest`, which would return level 2 representations for everything that has one, but level 1 representations for transient attributes.
22
+
23
+
We decided that even if you don't have that information, you could just get it from the `?level=1` endpoint. Or, implementations could specify their own way
24
+
25
+
26
+
## 2024-11-20 Custom modifiers should live in the schema under the `ga4gh` key
27
+
28
+
### Decision
29
+
30
+
Any global custom modifiers should live under a `ga4gh` key in the schemea. Right now, this includes `inherent`, `transient`, and `passthru`.
31
+
Local modifiers (currently just `collated`) will continue to live, raw, under the attribute they describe.
32
+
33
+
34
+
### Rationale
35
+
36
+
We want to follow the standard used in the other specs (VRS), and it also seems fine to have a place to lump together our custom modifiers.
37
+
We thought we could also do this for `collated`, as a local modifier, but opt not to right now because: there's only 1, it's a boolean, and it's not actually even used for anything in the spec at the moment, it is only there because it could be nice to use for a visualization of elements in a collection. The additional complexity of another layer just for this seems pointless at this point.
38
+
39
+
### Linked issues
40
+
41
+
-<https://github.com/ga4gh/refget/issues/84>
42
+
43
+
## 2024-11-13 Attributes can be designed as `passthru` or `transient`.
44
+
45
+
### Decision
46
+
47
+
We add two new attribute qualifiers: transient and passthru.
48
+
49
+
- Passthru attributes are not digested in transition from level 2 to level 1. Most attributes of the canonical (level 2) seqcol representation are digested to create the level 1 representation. But sometimes, we have an attribute for which digesting makes little sense. These attributes are passed through the transformation, so they show up on the level 1 representation in the same form as the level 2 representation. Thus, we refer to them as passthru attributes.
50
+
Transient attributes
51
+
52
+
- Transient attributes are not retrievable from the attribute endpoint. Most attributes of the sequence collection can be retrieved through the /attribute endpoint. However, some attributes may not be retrievable. For example, this could happen for an attribute that we intend to be used primarily as an identifier. In this case, we don't necessarily want to store the original content that went into the digest into the database, because it might be redundant. We really just want the final attribute. These attributes are called transient because the content of the attribute is no longer stored and is therefore no longer retrievable.
53
+
54
+
Also, a few other related decisions we finalized:
55
+
-`collection` endpoint, level 2 collection representation should exclude transient attributes.
56
+
-`attribute` endpoint wouldn't provide anything for either transient or passthru attributes.
57
+
- Can passthru or transient attributes be inherent? They could, but it probably doesn't really make sense. Nevertheless, there's no reason to state that they cannot be.
58
+
59
+
### Rationale
60
+
61
+
As we worked on more advanced attributes, and with the addition of the `/attribute` endpoint, we realized these changes necessitate a bit more power for the schema to specify behavior of the attributes. For the basic seqcol attributes (names, lengths, sequences) and original endpoint, the general algorithm and basic qualifiers (required, inherent, collated) suffice to describe the representation. But some more nuanced attributes require additional qualifiers to describe their intention and how the server should be behave for the `/attribute` endpoint. For example, sorted_name_length_pairs and sorted_sequences are intended to provide alternative tailored identifiers and comparisons, and not necessarily useful for independent attribute lookup. Similarly, custom extra attributes, like author or alias, may be simple appendages that don't need the complex digesting procedure we use for the basic attributes. In order to flag such attributes in a way that can govern slightly different server expectations, we need a couple of additional advanced attribute qualifiers. For this purpose, we added the passthru and transient qualifiers.
62
+
63
+
### Linked issues
64
+
65
+
-<https://github.com/ga4gh/refget/issues/86>
66
+
67
+
68
+
## 2024-10-02 Minimal schema should now require sequences, and lengths should not be inherent.
69
+
70
+
### Decision
71
+
72
+
We will update the minimal schema with these changes: 1. Move sequences into 'required', and 2. remove lengths from 'inherent'. So the final qualifiers would be:
73
+
- required: names, lengths, and sequences
74
+
- inherent: names, sequences
75
+
76
+
77
+
### Rationale
78
+
79
+
Originally, there was a good rationale for making sequences not required, to allow for coordinate systems to be represented as a seqcol.
80
+
But with the new `/attribute` endpoint, there's a better way to handle it, using `name_length_pairs` and `sorted_name_length_pairs` attributes.
81
+
Then, with sequences required, it does not make sense for lengths to be inherent because they are computable from sequences.
82
+
So essentially, the attribute endpoint allows us to move away from handling coordinate systems as top-level entities, and instead moves toward using the attribute endpoint for coordinate systems.
83
+
84
+
### Linked issues
85
+
86
+
-<https://github.com/ga4gh/refget/issues/72>
11
87
12
88
## 2024-10-02 The `/collection` and `/attribute` endpoints will both be `REQUIRED`
13
89
@@ -96,7 +172,7 @@ In the future if the number of proposed ancillary attributes grows, it could mov
96
172
97
173
### Linked issues
98
174
99
-
-<https://github.com/ga4gh/seqcol-spec/issues/71>
175
+
-<https://github.com/ga4gh/refget/issues/71>
100
176
101
177
102
178
## 2024-02-21 We will specify core sequence collection attributes and a process for adding new ones
@@ -120,9 +196,9 @@ Choosing to host this list as a list of issues allows the list to always be up t
## 2021-09-21 - Order will be recognized by digesting arrays in the given order, and unordered digests will be handled as extensions through additional attributes
832
908
@@ -854,7 +930,7 @@ To conclude, option A seems simple and straightforward, satisfies for a basic im
854
930
855
931
### Linked issues
856
932
857
-
- https://github.com/ga4gh/seqcol-spec/issues/5
933
+
- https://github.com/ga4gh/refget/issues/5
858
934
859
935
### Known limitations
860
936
@@ -877,7 +953,7 @@ However, there are also scenarios for which the order of sequences in a collecti
0 commit comments