You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/seqcols/README.md
+6-14Lines changed: 6 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,27 +29,19 @@ Finally, the protocol defines several recommended procedures that will improve c
29
29
30
30
Sequence collections represent fundamental concepts, making the specification adaptable to a wide range of use cases.
31
31
A primary goal is to enable sequence collection (seqcol) digests to replace or complement the human-readable identifiers currently used for reference genomes (e.g., "hg38" or "GRCh38").
32
-
Unfortunately, these simple identifiers often refer to references with subtle (or not so subtle) differences. Such variation leads to fundamental issues in analyses relying on reference genomes, undermining the utility of these identifiers.
32
+
Unfortunately, these simple identifiers often refer to references with subtle (or not so subtle) differences. Such variation leads to fundamental issues in analyses relying on reference genomes, undermining the utility of these identifiers.
33
33
34
34
Unique identifiers, such as those provided by the NCBI Assembly database, partially address this problem by unambiguously identifying specific assemblies. However, this approach has limitations:
35
35
36
36
- It depends on a central authority, which excludes custom genomes and doesn't cover all reference providers.
37
-
- Centralized identifiers alone cannot *confirm* identity, as identity also depends on the genome's content.
38
-
- It does not address the related challenge of determining compatibility among reference genomes. Analytical results or annotations based on different references may still be integrable if certain conditions are met, but current tools and standards lack the means to formalize and simplify compatibility comparisons.
37
+
- Centralized identifiers alone cannot *confirm* identity, as identity also depends on the genome's content.
38
+
- It does not address the related challenge of determining compatibility among reference genomes. Analytical results or annotations based on different references may still be integrable if certain conditions are met, but current tools and standards lack the means to formalize and simplify compatibility comparisons.
39
39
40
40
The [refget Sequences standard](../sequences/README.md) provides a partial solution applicable to individual sequences, such as a single chromosome.
41
41
However, refget Sequences does not directly address collections of sequences, such as a linear reference genome.
42
-
Building on refget Sequences, the *Sequence Collections* specification introduces foundational concepts that support diverse use cases, including:
43
-
44
-
-**Accessing sequences**: *As a data analyst, I want to know which sequences are in a specific collection so I can analyze them further.*
45
-
-**Comparing collections**: *As a data analyst, I want to compare the sequence collections used in two separate analyses to assess the compatibility of their resulting data.*
46
-
-**Annotation curation**: *As a data curator for SNP data, I want an unambiguous reference genome identifier upon which my SNP annotations can be interpreted, so I can compare them with confidence*.
47
-
-**Extracting subsets**: *As a data analyst, I want to extract specific sequences, such as those composing the chromosomes or karyotype of a genome.*
48
-
-**Validating submissions**: *As a submission system, I need to determine the exact content of a sequence collection to validate data file submissions.*
49
-
-**Embedding identifiers**: *As a software developer, I want to embed a sequence collection identifier in my tool's output, allowing downstream tools to identify the exact sequence collection used.*
50
-
-**Checking compatibility**: *As a data analyst using published data, I have a chromosome sizes file (a set of lengths and names) and want to determine whether a given sequence collection is length- or name-compatible with this file.*
51
-
-**Genome browser integration**: *As a genome browser, I use one sequence collection for the displayed coordinate system and want to check if a digest representing a given BED file's coordinate system is compatible with it.*
52
-
-**Annotating unknown references**: *As a data processor, I encounter input data without reference genome information and want to generate a sequence collection digest to attach, enabling further processing with seqcol features.*
42
+
Building on refget Sequences, the *Sequence Collections* specification introduces foundational concepts that support diverse use cases.
43
+
44
+
For detailed user stories and concrete examples of how seqcol addresses real-world challenges, see the [User Stories and Use Cases](user_stories.md) document.
0 commit comments