Home

Welcome to the Data-Flow-SOP Wiki

The Data Flow SOP outlines the process of moving from sequencing to making data available in Turbo. This involves organizing data in the Raw_sequencing_folder.

The repository structure is as follows:

Data-Flow-SOP
├── create_directories.py
├── create_higher_level_dirs.py
├── illumina
│   ├── move_files_to_directories_illumina.py
│   └── rename_samples_illumina.sh
├── nanopore
│   ├── move_files_to_dirs_nanopore.py
│   └── rename_samples_nanopore.sh
├── pics
│   ├── globus-raw-fastq-nanopore.png
│   ├── globus_raw_fastq.png
│   ├── transfer-nanoQC-results.png
│   └── transfer-qcd-results.png
├── processing-hybrid-samples.md
├── processing-illumina-samples.md
├── processing-nanopore-samples.md
└── README.md

Script Descriptions

create_directories.py
A Python script to create Project folder for organizing Illumina and Nanopore data (see New Sequence Data Structure)
create_higher_level_dirs.py
A Python script to set up a new Sequence_data directory structure (see New Sequence Data Structure).
illumina/move_files_to_directories_illumina.py
Moves Illumina samples that pass QC metrics based on the QCD pipeline.
illumina/rename_samples_illumina.sh
A Bash script to rename Illumina samples according to predefined rules.
nanopore/move_files_to_dirs_nanopore.py
Moves Nanopore samples that pass QC metrics based on the nanoQC pipeline.
nanopore/rename_samples_nanopore.sh
A Bash script to rename long-read Nanopore samples based on specific criteria.
processing-hybrid-samples.md
Contains instructions for processing hybrid samples.
processing-illumina-samples.md
Guides users through processing Illumina (short-read) data.
processing-nanopore-samples.md
Contains steps for processing Nanopore (long-read) data.
README.md
Provides an overview of the repository and instructions for usage.

Folder Descriptions

illumina/
Contains scripts for moving and renaming Illumina samples.
nanopore/
Includes scripts for organizing and renaming Nanopore samples.
pics/
Stores screenshots used in documentation for processing Illumina and Nanopore data.

New Sequence Data Structure

Important: If your project directory already contains a Sequence_data folder, rename it to old_Sequence_data before proceeding.

The new Sequence_data directory structure created by the create_higher_level_dirs.py script is organized as follows:

/Users/Dhatrib/Desktop/Project_Test/Sequence_data/
├── assembly
│   └── illumina
├── illumina_fastq
├── metadata
│   ├── AGC_submission
│   ├── plasmidsaurus
│   └── sample_lookup
└── variant_calling

The create_directories.py script further organizes the directory structure, as shown below:

/Users/Dhatrib/Desktop/Project_Test/Sequence_data/
├── assembly
│   └── illumina
├── illumina_fastq
│   ├── 2025-01-24_Plate1-to-Plate3
│   │   ├── failed_qc_samples
│   │   ├── neg_ctrl
│   │   ├── passed_qc_samples
│   │   └── raw_fastq
│   └── clean_fastq_qc_pass_samples
├── metadata
│   ├── AGC_submission
│   ├── plasmidsaurus
│   └── sample_lookup
└── variant_calling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the Data-Flow-SOP Wiki

Script Descriptions

Folder Descriptions

New Sequence Data Structure

Clone this wiki locally