-
Notifications
You must be signed in to change notification settings - Fork 182
[ENH] Add "study" DatasetType to organize a collection of source and derivative datasets #1972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[ENH] Add "study" DatasetType to organize a collection of source and derivative datasets #1972
Conversation
d0d5c37
to
fb4f5a4
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1972 +/- ##
=======================================
Coverage 82.15% 82.15%
=======================================
Files 17 17
Lines 1530 1530
=======================================
Hits 1257 1257
Misses 273 273 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
FWIW, we conversed with @effigies and he brought up an interesting argument, although IMHO not contradicting this one per se, is that ATM any BIDS dataset (raw or derivative) which already contains some subdatasets under edit: related linked below is #2103 highlighting the same situation with "raw" dataset containing "derivatives/" |
@effigies I wonder if we should extend |
I'm skeptical of that need. I would expect your
And
Subdatasets should be validatable BIDS datasets in their own right, avoiding the need for a top-level dataset_description.json to modify how they are intended to be validated. |
I think this overall needs more specification. What are valid directories in a I think a project dataset is barely worth specifying if we don't validate at least the raw data subdataset. Possibly we should have rules for indicating where validators should look for subdatasets. In OpenNeuroDerivatives, we use |
yet to "process" but a quick side idea inspired by #1928 --- I wonder if there is a hierarchy here: project (everything common) -> raw (current default, requires having sub- folder(s)) -> derivative (more stuff could be added), as every next level adds capabilities but includes all of the prior one as derivative could include raw in it? or we have already something which invalidates that? |
They are already there, that's somewhat the point here -- that we are already defining the structure of all those folders, nothing new to add.
I think presence of the subdatasets is not really the differentiation here, and formalization of rules for their validation is orthogonal to this issue. Having in mind my prior observation that "raw" is pretty much "a project with data in sub-* folders" we might be circling back to that issue of requiring Could even kinda become nice that we would facilitate people to even start their "raw BIDS datasets" as "project BIDS datasets" where they plan (README, code/ etc) until they start populate with data and thus becoming |
9c68bf8
to
6f236ce
Compare
@effigies Following your idea, I have now added a "warning" (to reflect level of the analogous SubjectFolders check in "raw" BIDS). I guess, in principle, we could take this as an opportunity to revert |
While discussing with @jbpoline we wondered, if may be we also mention "study" in various places in BIDS which seems to align nicely here❯ git grep study
src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [#863](https://github.com/bids-standard/bids-specification/pull/863) ([Remi-Gau](https://github.com/Remi-Gau))
src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference
src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined.
src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. |
src/appendices/hed.md:numerical values that are similar across the recordings in the study.
src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations.
src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies.
src/introduction.md:> The data used in the study were organized using the
src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1",
src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file
src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study.
src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file).
src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study,
src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs
src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study.
src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study.
src/schema/objects/common_principles.yaml: A person or animal participating in the study.
src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images -
src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study.
src/schema/objects/entities.yaml: A person or animal participating in the study.
src/schema/objects/enums.yaml:study:
src/schema/objects/enums.yaml: value: study
src/schema/objects/enums.yaml: display_name: study
src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template.
src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based.
src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study.
tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."]
and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly. |
This reverts commit a3c12f8 where I have tried to introduce it in bids-standard#1741 but it required a little more of further detailing.
Idea from @effigies while discussing this PR at BIDS Maintainers meeting 2025
…ith SubjectFolders check Also adjusted wording to be aligned too
While discussing with @jbpoline we wondered, if may be `study` would be a better descriptor to use here in favor of `project`. One of the rationales, is that e.g. in [BEP035](https://bids.neuroimaging.io/extensions/beps/bep_035.html) (attn @bids-standard/bep035) on Mega-analysis they introduce `study-` entity as a groupping element. It kinda then would match natively. we also mention "study" in various places in BIDS which seems to align nicely here ```shell ❯ git grep study src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [bids-standard#863](bids-standard#863) ([Remi-Gau](https://github.com/Remi-Gau)) src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined. src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. | src/appendices/hed.md:numerical values that are similar across the recordings in the study. src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations. src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies. src/introduction.md:> The data used in the study were organized using the src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1", src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study. src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file). src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study, src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study. src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/common_principles.yaml: A person or animal participating in the study. src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images - src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/entities.yaml: A person or animal participating in the study. src/schema/objects/enums.yaml:study: src/schema/objects/enums.yaml: value: study src/schema/objects/enums.yaml: display_name: study src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template. src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based. src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study. tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."] ``` and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly.
154a8cc
to
315c08f
Compare
…dataset Well, any BIDS dataset is a "study" dataset, but there the point is that ATM for both "raw" and "derivative" types we expect to have sub- folders and that was the prior behavior, which should not be affected by this PR. This should address the review comment of @effigies https://github.com/bids-standard/bids-specification/pull/1972/files#r2142639988
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
sorry, I do not see how this relates to this PR since having a "valid BIDS datasets in sourcedata/" seems to point to be a "derivative BIDS" dataset by its definition. edit: the point is that "study" dataset could even be empty to start with, start collecting various other |
@effigies who, among maintainers, do you think might also be interested to review this PR? |
But what are you validating if there are no valid subdatasets? Why are you running the validator? |
edit: formerly it was "project" but then renamed to "study" for better alignment
This PR was initially submitted as #1861 but I made a mistake to combine it with a discussion of transformations of existing projects' layouts into such BIDS project dataset. Please refer to that PR for examples but otherwise let's concentrate here on the discussion of this specific proposed change.
dataset_description.json
.TODOs: