Skip to content
Dhatri Badri edited this page Jan 21, 2025 · 5 revisions

Welcome to the Data-Flow-SOP Wiki

The Data Flow SOP outlines the process of moving from sequencing to making data available in Turbo. This involves organizing data in the Raw_sequencing_folder.

The repository structure is as follows:

Data-Flow-SOP
├── create_directories.py
├── create_higher_level_dirs.py
├── illumina
│   ├── move_files_to_directories_illumina.py
│   └── rename_samples_illumina.sh
├── nanopore
│   ├── move_files_to_dirs_nanopore.py
│   └── rename_samples_nanopore.sh
├── pics
│   ├── globus-raw-fastq-nanopore.png
│   ├── globus_raw_fastq.png
│   ├── transfer-nanoQC-results.png
│   └── transfer-qcd-results.png
├── processing-hybrid-samples.md
├── processing-illumina-samples.md
├── processing-nanopore-samples.md
└── README.md

Script Descriptions

  • create_directories.py
    A Python script to create Project folder for organizing Illumina and Nanopore data (see New Sequence Data Structure)

  • create_higher_level_dirs.py
    A Python script to set up a new Sequence_data directory structure (see New Sequence Data Structure).

  • illumina/move_files_to_directories_illumina.py
    Moves Illumina samples that pass QC metrics based on the QCD pipeline.

  • illumina/rename_samples_illumina.sh
    A Bash script to rename Illumina samples according to predefined rules.

  • nanopore/move_files_to_dirs_nanopore.py
    Moves Nanopore samples that pass QC metrics based on the nanoQC pipeline.

  • nanopore/rename_samples_nanopore.sh
    A Bash script to rename long-read Nanopore samples based on specific criteria.

  • processing-hybrid-samples.md
    Contains instructions for processing hybrid samples.

  • processing-illumina-samples.md
    Guides users through processing Illumina (short-read) data.

  • processing-nanopore-samples.md
    Contains steps for processing Nanopore (long-read) data.

  • README.md
    Provides an overview of the repository and instructions for usage.

Folder Descriptions

  • illumina/
    Contains scripts for moving and renaming Illumina samples.

  • nanopore/
    Includes scripts for organizing and renaming Nanopore samples.

  • pics/
    Stores screenshots used in documentation for processing Illumina and Nanopore data.

New Sequence Data Structure

Important: If your project directory already contains a Sequence_data folder, rename it to old_Sequence_data before proceeding.

The new Sequence_data directory structure created by the create_higher_level_dirs.py script is organized as follows:

/Users/Dhatrib/Desktop/Project_Test/Sequence_data/
├── assembly
│   └── illumina
├── illumina_fastq
├── metadata
│   ├── AGC_submission
│   ├── plasmidsaurus
│   └── sample_lookup
└── variant_calling

The create_directories.py script further organizes the directory structure, as shown below:

/Users/Dhatrib/Desktop/Project_Test/Sequence_data/
├── assembly
│   └── illumina
├── illumina_fastq
│   ├── 2025-01-24_Plate1-to-Plate3
│   │   ├── failed_qc_samples
│   │   ├── neg_ctrl
│   │   ├── passed_qc_samples
│   │   └── raw_fastq
│   └── clean_fastq_qc_pass_samples
├── metadata
│   ├── AGC_submission
│   ├── plasmidsaurus
│   └── sample_lookup
└── variant_calling