Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Rina Ahmed-Begrich committed Jul 22, 2024
0 parents commit b4d5519
Show file tree
Hide file tree
Showing 39 changed files with 1,690 additions and 0 deletions.
15 changes: 15 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# EditorConfig is awesome: http://EditorConfig.org

# top-most EditorConfig file
root = true

[*]
end_of_line = lf
insert_final_newline = true
charset = utf-8
indent_style = space
indent_size = 4

[*.{yml,yaml}]
indent_style = space
indent_size = 2
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.smk linguist-language=Python
Snakefile linguist-language=Python
18 changes: 18 additions & 0 deletions .github/workflows/conventional-prs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: PR
on:
pull_request_target:
types:
- opened
- reopened
- edited
- synchronize

jobs:
title-format:
runs-on: ubuntu-latest
steps:
- uses: amannn/[email protected]
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
validateSingleCommit: true
54 changes: 54 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Tests

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]


jobs:
Formatting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Formatting
uses: github/super-linter@v4
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
VALIDATE_SNAKEMAKE_SNAKEFMT: true

Linting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Lint workflow
uses: snakemake/[email protected]
with:
directory: .
snakefile: workflow/Snakefile
args: "--lint"

Testing:
runs-on: ubuntu-latest
needs:
- Linting
- Formatting
steps:
- uses: actions/checkout@v2

- name: Test workflow
uses: snakemake/[email protected]
with:
directory: .test
snakefile: workflow/Snakefile
args: "--use-conda --show-failed-logs --cores 3 --conda-cleanup-pkgs cache --all-temp"

- name: Test report
uses: snakemake/[email protected]
with:
directory: .test
snakefile: workflow/Snakefile
args: "--report report.zip"
17 changes: 17 additions & 0 deletions .github/workflows/release-please.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
on:
push:
branches:
- main

name: release-please

jobs:
release-please:
runs-on: ubuntu-latest
steps:

- uses: GoogleCloudPlatform/release-please-action@v2
id: release
with:
release-type: go # just keep a changelog, no version anywhere outside of git tags
package-name: <repo>
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
results/**
resources/**
.snakemake
.snakemake/**
.venv/**
.env/**
.DS_Store
!.gitignore
!.gitattributes
!.editorconfig

# Custom additions
Notes.md
.vscode/*
.snakemake-workflow-catalog.yml
79 changes: 79 additions & 0 deletions .test/config/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@

# optional: define output folder here
# default: "./results"
output: null

# define samplesheet here
samplesheet: "./test/config/samples.tsv"

get_genome:
database: "ncbi"
assembly: "GCF_000006785.2"
fasta: Null
gff: Null

cutadapt:
fivep_adapter: Null
threep_adapter: "ATCGTAGATCGGAAGAGCACACGTCTGAA"
default: ["-q 10 ", "-m 22 ", "-M 52", "--overlap=3"]

umi_extraction:
method: "regex"
pattern: "^(?P<umi_0>.{2}).*(?P<umi_1>.{5})$"
umi_dedup: ["--edit-distance-threshold=0", "--read-length"]

star:
index: Null
genomeSAindexNbases: 9
multi: 10
sam_multi: 1
intron_max: 1
default: [
"--readFilesCommand zcat ",
"--outSAMstrandField None ",
"--outSAMattributes All ",
"--outSAMattrIHstart 0 ",
"--outFilterType Normal ",
"--outFilterMultimapScoreRange 1 ",
"-o STARmappings ",
"--outSAMtype BAM Unsorted ",
"--outStd BAM_Unsorted ",
"--outMultimapperOrder Random ",
"--alignEndsType EndToEnd"]

extract_features:
biotypes: ["rRNA", "tRNA"]
CDS: ["protein_coding"]

bedtools_intersect:
defaults: ["-v ", "-s ", "-f 0.2"]

deeptools:
libtype: "sense" # parameter not used yet. Only sense direction currently handled # TODO #
bin_size: 1
normalize: "CPM"

annotate_orfs:
window_size: 30
sorf_max_length: 300
sorf_min_length: 45
orf_start_codon_table: 11
orf_stop_codon: ["TAA", "TAG", "TGA"]
orf_longest_only: False

shift_reads:
window_size: 30
read_length: [27, 45]
# rpf_read_length: [30, 45]
# qti_read_length: [27, 45]
rnaseq_read_length: [0, 1000]
end_alignment: "3prime"
shift_table: "config/shift_table/shift_table.csv"
export_bam: False
export_bigwig: True
skip_shifting: False
skip_length_filter: True


multiqc:
config: "config/multiqc_config.yml"
4 changes: 4 additions & 0 deletions .test/config/samples.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sample condition replicate lib_prep data_folder fq1
RPF-RTP1 RPF-RTP 1 mpusp .test/data RPF-RTP1_R1_001.fastq.gz
RPF-RTP2 RPF-RTP 2 mpusp .test/data RPF-RTP2_R1_001.fastq.gz

Binary file added .test/data/RPF-RTP1_R1_001.fastq.gz
Binary file not shown.
Binary file added .test/data/RPF-RTP2_R1_001.fastq.gz
Binary file not shown.
41 changes: 41 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Copyright License for non-commercial scientific research purposes

Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use files from this repository (Files). By downloading and/or using the Files, you acknowledge that you have read these terms and conditions, understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Files. Any infringement of the terms of this agreement will automatically terminate your rights under this License

Ownership / Licensees
The Files and the associated materials have been developed at the Max Planck Unit for the Science of Pathogens (hereinafter "MPUSP") by members of the Bioinformatics Platform (hereinafter “developers”).
Any copyright or patent right is owned by and proprietary material of the Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (hereinafter “MPG”; MPUSP and MPG hereinafter collectively “Max-Planck”) hereinafter the “Licensor”.

License Grant
Licensor grants you (Licensee) personally a single-user, non-exclusive, non-transferable, free of charge right:
• To download the Files on computers owned, leased or otherwise controlled by you and/or your organization;
• To use the Files for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects;
• To modify, adapt, translate or create derivative works based upon the Files.
Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes. The Files may not be reproduced, modified and/or made available in any form to any third party without Max-Planck’s prior written permission. Third party funded research at academic institution is considered non-Commercial Use. By downloading the Files, you agree not to reverse engineer it.

The Licensee is encouraged to provide feedback on the use, results, modifications, bugs and technical questions or publications to [email protected].

No Distribution
The Files and the license herein granted shall not be copied, shared, distributed, re-sold, offered for re-sale, transferred or sub-licensed in whole or in part except that you may make one copy for archive purposes only and the use of the Files by Other researchers at your institution under the conditions of this license.

Disclaimer of Representations and Warranties
You expressly acknowledge and agree that the Files results from basic research, is provided “AS IS”, may contain errors, and that any use of the Files is at your sole risk. LICENSOR MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE FILES, NEITHER EXPRESS NOR IMPLIED, AND THE ABSENCE OF ANY LEGAL OR ACTUAL DEFECTS, WHETHER DISCOVERABLE OR NOT. Specifically, and not to limit the foregoing, licensor makes no representations or warranties (i) regarding the merchantability or fitness for a particular purpose of the Files, (ii) that the use of the Files will not infringe any patents, copyrights or other intellectual property rights of a third party, and (iii) that the use of the Files will not cause any damage of any kind to you or a third party.

Limitation of Liability
Because this File License Agreement qualifies as a donation, according to Section 521 of the German Civil Code (Bürgerliches Gesetzbuch – BGB) Licensor as a donor is liable for intent and gross negligence only. If the Licensor fraudulently conceals a legal or material defect, they are obliged to compensate the Licensee for the resulting damage.
Licensor shall be liable for loss of data only up to the amount of typical recovery costs which would have arisen had proper and regular data backup measures been taken. The foregoing applies also to Licensor’s legal representatives or assistants in performance. Any further liability shall be excluded. Patent claims generated through the usage of the Files cannot be directed towards the copyright holders.
The Files are provided in state of development the licensor defines. If modified or extended by Licensee, the Licensor makes no claims about the fitness of the Files and is not responsible for any problems such modifications cause.

No Maintenance Services
You understand and agree that Licensor is under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Files. Licensor nevertheless reserves the right to update, modify, or discontinue the Software at any time.
Defects of the Files must be notified in writing to the Licensor with a comprehensible description of the error symptoms. The notification of the defect should enable the reproduction of the error. The Licensee is encouraged to communicate any use, results, modification or publication.

Publications using the Files
You acknowledge that the Files are a valuable scientific resource and agree to appropriately acknowledge MPUSP in any publication making use of the Files.

Commercial licensing opportunities
For commercial uses of the Files, please send an email to [email protected].

Miscellaneous
This Agreement shall be governed by the laws of the Federal Republic of Germany except for the UN Sales Convention.
This License Text is itself licensed under Creative Commons NC BY NA 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
87 changes: 87 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# snakemake-bacterial-riboseq

![Platform](https://img.shields.io/badge/platform-all-green)
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0.0-brightgreen.svg)](https://snakemake.github.io)
[![GitHub actions status](https://github.com/MPUSP/snakemake-bacterial-riboseq/workflows/Tests/badge.svg?branch=main)](https://github.com/MPUSP/snakemake-bacterial-riboseq/actions?query=branch%3Amain+workflow%3ATests)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)

---

A Snakemake workflow for the analysis of bacterial riboseq data.

- [snakemake-bacterial-riboseq](#snakemake-bacterial-riboseq)
- [Usage](#usage)
- [Workflow overview](#workflow-overview)
- [Installation](#installation)
- [Authors](#authors)
- [References](#references)

## Usage

The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=MPUSP%2Fsnakemake-bacterial-riboseq).

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and its DOI (see above).

## Workflow overview

TODO: include first part of the figure here.

## Installation

**Step 1: Clone this repository**

```bash
git clone https://github.com/MPUSP/snakemake-bacterial-riboseq.git
cd snakemake-bacterial-riboseq
```

**Step 2: Install dependencies**

It is recommended to install snakemake and run the workflow with `conda`, `mamba` or `micromamba`.

```bash
# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# install Mamba
conda install -n base -c conda-forge -y mamba
```

**Step 3: Create snakemake environment**

This step creates a new conda environment called `snakemake-bacterial-riboseq`.

```bash
# create new environment with dependencies & activate it
mamba env create -c conda-forge -c bioconda -n snakemake-bacterial-riboseq snakemake pandas
conda activate snakemake-bacterial-riboseq
```

### Additional tools

**Important note:**

All other dependencies for the workflow are **automatically pulled as `conda` environments** by snakemake, when running the workflow with the `--use-conda` parameter (recommended).


## Authors

- Dr. Rina Ahmed-Begrich
- Affiliation: [Max-Planck-Unit for the Science of Pathogens](https://www.mpusp.mpg.de/) (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-0656-1795
- Dr. Michael Jahn
- Affiliation: [Max-Planck-Unit for the Science of Pathogens](https://www.mpusp.mpg.de/) (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-3913-153X
- github page: https://github.com/m-jahn


Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.

## References

- Essential tools are linked in the top section of this document
- The sequencing library preparation is based on the publication:
> McGlincy, N. J., & Ingolia, N. T. _Transcriptome-wide measurement of translation by ribosome profiling_. Methods, 126, 112–129, **2017**. https://doi.org/10.1016/J.YMETH.2017.05.028.
Loading

0 comments on commit b4d5519

Please sign in to comment.