ATAC-seqPipeline

This ATACseq pipeline provides several tools to analyze ATAC-seq experiments. This pipeline is configured to automatically submit jobs using the SLURM environment.

Installation:

conda
Install with conda env create --file envs/ATACseq_env.yml
SLURM environment configured to submit jobs.
Singularity (optional)
If you want, you can use a Singularity environment with the environment pre-built. Install with singularity build envs/Singularity
Github

To install from github, you can clone the repository using:
git clone https://github.com/ScrippsPipkinLab/ATAC-seqPipeline.git

Usage:

Sample metadata is entered into a simple tab-delimited text file. Make sure to include the below columns in the correct order. See core/exp122_ssheet.txt for a template.

SampleName Unique sample name. Must be different for each replicate of each sample. Set this to Sample_ReplicateNumber when in doubt.
Read1
Path to R1 FASTQ file. Future versions will support single-end reads.
Read2
Path to R2 FASTQ file.
Status
This respresents the group of replicates that represent the sample. The status is the same accross replicates. For example, this could be a RNAmir construct, or an organ, or even a cell type.
CT
Control or treatment status denoted by a C or T respectively.

See core/exp122_ssheet.txt for a template. The provided example notebook file goes through how to access the functions in an interactive Jupyter session. In brief, you can import the python library and setup your experiment with:

import ATACseqPipeline
myexp = ATACseqPipeline.Pipeline(data_path='path/to/empty/dir', dry_run=True, app_path='/ATACseqPipeline')
myexp.from_ssheet(ssheet_path='ATACseqPipeline/data/exp122_ssheet.txt')

Once the sample sheet has been configured, the entire pipline can me run with:

myexp.main()

To view your submitted jobs as they complete, run the following in the shell. You can also run this directly in the notebook using "!" before each command.

squeue -u your_username

Abort the running jobs using core/cancel_job.sh
This pipeline will submit several jobs that are dependent on each other. The runtime can reach several hours depending on the available nodes. You can also run individual parts of the pipeline, and the example python notebook provides an in-depth walkthrough of this. The output from every step is saved. This will cause quite a large amount of data to be saved (> 50 gigabytes for a 6 sample experiment). It is up to the user to delete files saved in /data.

Please feel free to raise issues on Github or shoot me an email:
Shashank Nagaraja
[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
core		core
envs		envs
refs		refs
.gitignore		.gitignore
ATACseq_Pipeline.drawio.png		ATACseq_Pipeline.drawio.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATAC-seqPipeline

About

Releases

Packages

Languages

License

ScrippsPipkinLab/ATAC-seqPipeline

Folders and files

Latest commit

History

Repository files navigation

ATAC-seqPipeline

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages