-
Notifications
You must be signed in to change notification settings - Fork 9
Closed
Labels
refget-seqcolschema-termProposals for terms in the core schemaProposals for terms in the core schema
Description
We decided to start with two schemas: a minimal schema that we would post now as what we should implement, and then an extended schema, which is in evaluation stage to see if it should end up in the minimal schema. Here are some drafts of these for comment and revision:
Minimal seqcol schema
description: "A collection of biological sequences, defined by the GA4GH Sequence Collections standard."
$id: "/schemas/seqcol_base"
version: 0.1.0
type: object
properties:
lengths:
type: array
collated: true
description: "Number of elements, such as nucleotides or amino acids, in each sequence."
items:
type: integer
names:
type: array
collated: true
description: "Human-readable identifiers of each sequence (e.g. chromosome names or accessions)."
items:
type: string
sequences:
type: array
collated: true
description: "Digests of sequences computed using the GA4GH digest algorithm (sha512t24u)."
items:
type: string
sorted_name_length_pairs:
type: array
description: "Sorted digests of names+lengths pairs, computed following the seqcol specification."
items:
type: string
required:
- lengths
- names
inherent:
- lengths
- names
- sequencesExtended seqcol schema
$ref: "/schemas/seqcol_base"
$id: "/schemas/seqcol_extended"
properties:
masks:
type: array
collated: true
description: "Digests of subsequence masks indicating subsequences to be excluded from an analysis, such as repeats"
items:
type: string
priorities:
type: array
collated: true
description: "Annotation of whether each sequence is a primary or secondary component in the collection."
items:
type: boolean
topologies:
type: array
collated: true
description: "Annotation of whether each sequence represents a linear or other topology."
items:
type: string
enum: ["circular", "linear"]
default: "linear"
molecule_types:
type: array
collated: true
description: "Designation of the type of molecule for each sequence, such as RNA, DNA, or protein."
items:
type: string
alphabets:
type: array
collated: true
description: "The set of characters actually present in each sequence"
items:
type: string
alphabet_domains:
type: array
collated: true
description: "The set of characters that could be included in each sequence"
items:
type: stringMetadata
Metadata
Assignees
Labels
refget-seqcolschema-termProposals for terms in the core schemaProposals for terms in the core schema