You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sequences/README.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ Implementers can check if their refget implementations conform to the specificat
35
35
36
36
## Protocol essentials
37
37
38
-
All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters and HTTP headers, and return text or other allowed formatting as requested by the user. Successful requests result with HTTP status code 200 and have the appropriate text encoding in the response body as defined for each endpoint. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism.
38
+
All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters and HTTP headers, and return text or other allowed formatting as requested by the user. Successful requests result in HTTP status code 200 and have the appropriate text encoding in the response body as defined for each endpoint. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism.
39
39
40
40
The response for sequence retrieval has a character set of US-ASCII and consists solely of the requested sequence or sub-sequence with no line breaks. Other formatting of the response sequence may be allowed by the server, subject to standard negotiation with the client via the Accept header.
41
41
@@ -99,25 +99,25 @@ The policies and processes used to perform user authentication and authorization
99
99
## Checksum calculation
100
100
The recommended checksum algorithms are `MD5` (a 32 character HEX string) and a SHA-512-based system called `ga4gh` (a base64 URL-safe string, see later for details). Servers MUST support sequence retrieval by one or more of these algorithms, and are encouraged to support all to maximize interoperability. An older algorithm called `TRUNC512` existed in version 1.0.0 of refget but is now deprecated in favour of the GA4GH sequence checksum string. It is possible to translate between the `ga4gh` and `TRUNC512` systems however `TRUNC512` usage SHOULD be discouraged.
101
101
102
-
When calculating the checksum for a sequence, all non-base symbols (\n, spaces, etc) must be removed and then uppercase the rest. The allowed alphabet for checksum calculation is uppercase ASCII (`0x41`-`0x5A` or `A-Z`).
102
+
When calculating the checksum for a sequence, all non-base symbols (\n, spaces, etc) must be removed and then the rest uppercased. The allowed alphabet for checksum calculation is uppercase ASCII letters (`0x41`-`0x5A` or `A-Z`).
103
103
104
104
Resulting hexadecimal checksum strings shall be considered case insensitive. 0xa is equivalent to 0xA.
105
105
106
106
## refget Checksum Algorithm
107
-
The refget checksum algorithm is called `ga4gh`. It is based and derived from work carried out by the GA4GH VRS group. It is defined as follows:
107
+
The refget checksum algorithm is called `ga4gh`. It is based on and derived from work carried out by the GA4GH VRS group. It is defined as follows:
108
108
109
109
- SHA-512 digest of a sanitised sequence
110
110
- A base64 url encoding of the first 24 bytes of that digest
111
111
- The addition of `SQ.` to the string
112
112
113
-
Services may also implement the older `TRUNC512` representation of a truncated SHA-512 digest and is compatible with the above `ga4gh` string. See later in this specification for implementation details of the TRUNC512 algorithm and conversion between `ga4gh` and `TRUNC512`.
113
+
Services may also implement the older `TRUNC512` representation of a truncated SHA-512 digest, which uses similar ideas to the above `ga4gh` string. See later in this specification for implementation details of the TRUNC512 algorithm and conversion between `ga4gh` and `TRUNC512`.
114
114
115
-
A `ga4gh` digest of `ACGT`MUST result in the string `SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2`.
115
+
For example, the `ga4gh` digest of `ACGT`is the string `SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2`.
116
116
117
117
## Namespace of the checksums
118
118
119
119
The requested checksum can optionally be prefixed with a namespace describing the type of algorithm being used.
120
-
For example using md5 `md5:6aef897c3d6ff0c78aff06ac189178dd` and `6aef897c3d6ff0c78aff06ac189178dd` should return the same sequence and using ga4gh `ga4gh:SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2` and `SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2` should also return the same sequence.
120
+
For example using md5 `md5:6aef897c3d6ff0c78aff06ac189178dd` and `6aef897c3d6ff0c78aff06ac189178dd` should return the same sequence and similarly using ga4gh `ga4gh:SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2` and `SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2` should return the same sequence.
121
121
122
122
## Unique Identifiers
123
123
Refget optionally allows the use of namespaced identifiers in place of the digest. The identifier prefixed by a namespace to form a CURIE for example:
@@ -366,7 +366,7 @@ An array of strings listing the type identifiers supported. Values used should b
366
366
<code>subsequence_limit</code><br/>
367
367
int or null
368
368
</td><td>
369
-
An integer giving the maximum length of sequence which may be requested using <code>start</code> and/or <code>end</code> query parameters or <code>Range</code> header. <code>null</code> values or values lower than 1 or mean the server has no imposed limit.
369
+
An integer giving the maximum length of sequence which may be requested using <code>start</code> and/or <code>end</code> query parameters or <code>Range</code> header. <code>null</code> values or values lower than 1 mean the server has no imposed limit.
370
370
</td></tr>
371
371
</table>
372
372
</td></tr>
@@ -468,7 +468,7 @@ Any bytes added for formatting to the returned output should not be taken in to
468
468
469
469
Refget implementations MUST support the `MD5` identifier space and SHOULD support the `ga4gh` identifier. Non-standard identifiers are allowed but they MUST conform to the following requirements:
470
470
471
-
1. Non-standard identifiers must be based on an algorithm, which uses normalised sequence content as input
471
+
1. Non-standard identifiers must be based on an algorithm that uses normalised sequence content as input
472
472
2. The algorithm used SHOULD be a hash function
473
473
3. Non-standard identifiers must not clash with the `MD5` and `ga4gh` identifier space
474
474
- Note `ga4gh` is allowed to grow in length should collisions in the current implementation be detected
@@ -482,14 +482,14 @@ Examples on how to implement both algorithm schemes in [Python](pub/ga4gh_and_TR
482
482
483
483
## Design Rationale
484
484
485
-
This section details behind key API decisions.
485
+
This non-normative section provides the details behind key API decisions.
486
486
487
487
### Checksum Input Normalisation
488
488
489
-
Key to generating reproducible checksums is the normalisation algorithm applied to sequence input. This API is based on the requirements of SAM/BAM, CRAM Reference Registry and VMC specifications. Both of these specs' own normalisation algorithms are detailed below:
489
+
Key to generating reproducible checksums is the normalisation algorithm applied to sequence input. This API is based on the requirements of SAM/BAM, CRAM Reference Registry and VMC specifications. These specifications' own normalisation algorithms are detailed below:
490
490
491
491
- SAM/BAM
492
-
- All characters outside of the inclusive range `33` (`0x21`/`!`) and`126` (`0x7E`/`~`) are stripped out
492
+
- All characters outside of the inclusive range `33` (`0x21`/`!`) through`126` (`0x7E`/`~`) are stripped out
493
493
- All lower-case characters are converted to upper-case
0 commit comments