diff --git a/.gitignore b/.gitignore
index 23a8550..0cf8249 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,10 @@ resources/**
 !.gitignore
 !.gitattributes
 !.editorconfig
+**.conda
+**.condarc
+**.cache
+**.java
 
 # Custom additions
 Notes.md
diff --git a/README.md b/README.md
index 107f29f..a01ba01 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # snakemake-bacterial-riboseq
 
 ![Platform](https://img.shields.io/badge/platform-all-green)
-[![Snakemake](https://img.shields.io/badge/snakemake-≥7.0.0-brightgreen.svg)](https://snakemake.github.io)
+[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg)](https://snakemake.github.io)
 [![Tests](https://github.com/MPUSP/snakemake-bacterial-riboseq/actions/workflows/main.yml/badge.svg)](https://github.com/MPUSP/snakemake-bacterial-riboseq/actions/workflows/main.yml)
 [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
 [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1D355C.svg?labelColor=000000)](https://sylabs.io/docs/)
@@ -33,26 +33,27 @@ If you use this workflow in a paper, don't forget to give credits to the authors
 ## Workflow overview
 
 <!-- include logo-->
+
 <img src="resources/images/logo.png" align="center" />
 
-----------
+---
 
 This workflow is a best-practice workflow for the analysis of ribosome footprint sequencing (Ribo-Seq) data.
 The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps:
 
- 1. Obtain genome database in `fasta` and `gff` format (`python`, [NCBI Datasets](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/))
-    1. Using automatic download from NCBI with a `RefSeq` ID
-    2. Using user-supplied files
- 2. Check quality of input sequencing data (`FastQC`)
- 3. Cut adapters and filter by length and/or sequencing quality score (`cutadapt`)
- 4. Deduplicate reads by unique molecular identifier (UMI, `umi_tools`)
- 5. Map reads to the reference genome (`STAR aligner`)
- 6. Sort and index for aligned seq data (`samtools`)
- 7. Filter reads by feature type (`bedtools`)
- 8. Generate summary report for all processing steps (`MultiQC`)
- 9. Shift ribo-seq reads according to the ribosome's P-site alignment (`R`, `ORFik`)
- 10. Calculate basic gene-wise statistics such as RPKM (`R`, `ORFik`)
- 11. Return report as HTML and PDF files (`R markdown`, `weasyprint`)
+1. Obtain genome database in `fasta` and `gff` format (`python`, [NCBI Datasets](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/))
+   1. Using automatic download from NCBI with a `RefSeq` ID
+   2. Using user-supplied files
+2. Check quality of input sequencing data (`FastQC`)
+3. Cut adapters and filter by length and/or sequencing quality score (`cutadapt`)
+4. Deduplicate reads by unique molecular identifier (UMI, `umi_tools`)
+5. Map reads to the reference genome (`STAR aligner`)
+6. Sort and index for aligned seq data (`samtools`)
+7. Filter reads by feature type (`bedtools`)
+8. Generate summary report for all processing steps (`MultiQC`)
+9. Shift ribo-seq reads according to the ribosome's P-site alignment (`R`, `ORFik`)
+10. Calculate basic gene-wise statistics such as RPKM (`R`, `ORFik`)
+11. Return report as HTML and PDF files (`R markdown`, `weasyprint`)
 
 If you want to contribute, report issues, or suggest features, please get in touch on [github](https://github.com/MPUSP/snakemake-bacterial-riboseq).
 
@@ -67,18 +68,7 @@ cd snakemake-bacterial-riboseq
 
 **Step 2: Install dependencies**
 
-It is recommended to install snakemake and run the workflow with `conda`, `mamba` or `micromamba`.
-
-```bash
-# download Miniconda3 installer
-wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
-# install Conda (respond by 'yes')
-bash miniconda.sh
-# update Conda
-conda update -y conda
-# install Mamba
-conda install -n base -c conda-forge -y mamba
-```
+It is recommended to install snakemake and run the workflow with `conda` or `mamba`. [Miniforge](https://conda-forge.org/download/) is the preferred conda-forge installer and includes `conda`, `mamba` and their dependencies.
 
 **Step 3: Create snakemake environment**
 
@@ -94,7 +84,6 @@ conda activate snakemake-bacterial-riboseq
 
 All other dependencies for the workflow are **automatically pulled as `conda` environments** by snakemake, when running the workflow with the `--use-conda` parameter (recommended).
 
-
 ## Running the workflow
 
 ### Input data
@@ -122,7 +111,7 @@ Ribosome footprint sequencing data in `*.fastq.gz` format. The currently support
 Some configuration parameters of the pipeline may be specific for your data and library preparation protocol. The options should be adjusted in the `config.yml` file. For example:
 
 - Minimum and maximum read length after adapter removal (see option `cutadapt: default`). Here, the test data has a minimum read length of 15 + 7 = 22 (2 nt on 5'end + 5 nt on 3'end), and a maximum of 45 + 7 = 52.
-- Unique molecular identifiers (UMIs). For example, the protocol by [McGlincy & Ingolia, 2017](https://doi.org/10.1016/J.YMETH.2017.05.028) creates a UMI that is located on both the 5'-end (2 nt) and the 3'-end (5 nt). These UMIs are extracted with `umi_tools` (see options `umi_extraction: method` and `pattern`).
+- Unique molecular identifiers (UMIs). For example, the protocol by [McGlincy &amp; Ingolia, 2017](https://doi.org/10.1016/J.YMETH.2017.05.028) creates a UMI that is located on both the 5'-end (2 nt) and the 3'-end (5 nt). These UMIs are extracted with `umi_tools` (see options `umi_extraction: method` and `pattern`).
 
 Example configuration files for different sequencing protocols can be found in `resources/protocols/`.
 
@@ -144,13 +133,13 @@ snakemake --dry-run
 To run the complete workflow with test files using **`conda`**, execute the following command. The definition of the number of compute cores is mandatory.
 
 ```bash
-snakemake --cores 10 --use-conda --directory .test
+snakemake --cores 10 --sdm conda --directory .test
 ```
 
-To run the workflow with **singularity**, use:
+To run the workflow with **singularity** / **apptainer**, use:
 
 ```bash
-snakemake --cores 10 --use-singularity --use-conda --directory .test
+snakemake --cores 10 --sdm conda apptainer --directory .test
 ```
 
 ### Parameters
@@ -218,11 +207,11 @@ This table lists all parameters that can be used to run the workflow.
   - ORCID profile: https://orcid.org/0000-0002-3913-153X
   - github page: https://github.com/m-jahn
 
-
 Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.
 
 ## References
 
 - Essential tools are linked in the top section of this document
 - The sequencing library preparation is based on the publication:
+
 > McGlincy, N. J., & Ingolia, N. T. _Transcriptome-wide measurement of translation by ribosome profiling_. Methods, 126, 112–129, **2017**. https://doi.org/10.1016/J.YMETH.2017.05.028.
diff --git a/workflow/Snakefile b/workflow/Snakefile
index ab52982..f1551e0 100644
--- a/workflow/Snakefile
+++ b/workflow/Snakefile
@@ -8,13 +8,14 @@
 #          may apply.                                   #
 # ----------------------------------------------------- #
 
+conda:
+    "envs/global.yml"
+
 import os
 import pandas as pd
 from datetime import date
 from snakemake.utils import min_version
 
-# min_version("7.0")
-
 __author__ = "Rina Ahmed-Begrich, Michael Jahn"
 __year__ = str(date.today()).split("-")[0]
 
diff --git a/workflow/envs/global.yml b/workflow/envs/global.yml
new file mode 100644
index 0000000..8c334a1
--- /dev/null
+++ b/workflow/envs/global.yml
@@ -0,0 +1,6 @@
+name: global
+channels:
+  - conda-forge
+  - bioconda
+dependencies:
+  - pandas=2.2.2
diff --git a/workflow/notebooks/report.Rmd b/workflow/notebooks/report.Rmd
index dea786f..f32828a 100644
--- a/workflow/notebooks/report.Rmd
+++ b/workflow/notebooks/report.Rmd
@@ -140,7 +140,6 @@ Imported results:
 # tables
 df_annotated <- read_csv(snakemake@input$orfs_annotated, show_col_types = FALSE)
 df_features <- read_csv(snakemake@input$orfs_features, show_col_types = FALSE)
-df_features <- mutate(df_features, sample_name = str_remove(sample_name, "\\.bam$"))
 
 # import original and processed read data
 bam <- lapply(snakemake@input[["bam_filtered"]], ORFik::readBam)
@@ -153,6 +152,11 @@ names(bam_shift) <- str_split_i(snakemake@input[["bam_shifted"]], "\\/", -1) %>%
 
 sample_names <- names(bam)
 
+# condition table
+df_conditions <- df_features %>%
+  dplyr::select(sample_name, condition, replicate) %>%
+  distinct()
+
 # GRanges
 load(snakemake@input$granges)
 
@@ -192,21 +196,32 @@ head(df_features)
 - the default behavior is to filter out reads that are too short to be reliable footprints (config option `shift_reads: read_length`)
 
 ```{r, echo = FALSE, warning = FALSE, fig.width = figwidth, fig.height = figheight}
-plot_read_number <- c(bam, bam_shift) %>%
+df_read_number <- c(bam, bam_shift) %>%
   lapply(length) %>%
   unlist() %>%
-  enframe() %>%
+  enframe(name = "sample_name", value = "count") %>%
   mutate(status = rep(c("original", "filtered"), each = length(sample_names))) %>%
-  ggplot(aes(y = fct_rev(name), x = value, fill = status)) +
-  geom_col(position = "dodge") +
+  left_join(df_conditions, by = "sample_name") %>%
+  mutate(replicate = fct_inseq(factor(replicate)))
+
+plot_read_number <- df_read_number %>%
+  ggplot(aes(y = fct_rev(condition), x = count, fill = replicate)) +
+  geom_col(data = . %>% filter(status == "original"), position = position_dodge(width = 0.9), alpha = 0.4) +
+  geom_col(data = . %>% filter(status == "filtered"), position = position_dodge(width = 0.9), alpha = 1.0) +
+  geom_text(
+    data = . %>% filter(status == "original"),
+    position = position_dodge(0.9), hjust = 0, size = 3,
+    aes(x = count + max(count / 100), label = paste0(round(count / 10^6, 1), "M"), color = replicate)
+  ) +
   geom_text(
+    data = . %>% filter(status == "filtered"),
     position = position_dodge(0.9), hjust = 0, size = 3,
-    aes(x = value + max(value / 100), label = paste0(round(value / 10^6, 1), "M"), color = status)
+    aes(x = count + max(count / 100), label = paste0(round(count / 10^6, 1), "M"), color = replicate)
   ) +
   labs(
     x = "n total reads", y = "",
     title = "Read number for all samples",
-    subtitle = "color: before and after filtering by size"
+    subtitle = "color: replicate, transparency: before and after filtering by size"
   ) +
   custom_theme(legend.position = "bottom") +
   scale_fill_manual(values = custom_colors) +
@@ -225,7 +240,7 @@ print(plot_read_number)
 
 ### Read length distribution
 
-- this figure shows read length distribution for all samples and data types
+- this figure shows read length distribution for all samples
 - mapped read lengths from ribosome profiling data are usually variable
 - often one can see a bimodal distribution, with the larger reads being more reliable (representing genuine footprints)
 - shorter read fractions (e.g. shorter than a regular footprint) are ususally discarded
@@ -239,22 +254,27 @@ df_read_lengths <- lapply(bam, function(file) {
     enframe(name = "read_length", value = "read_count") %>%
     mutate(read_count = as.numeric(read_count))
 }) %>%
-  bind_rows(.id = "name")
+  bind_rows(.id = "sample_name") %>%
+  left_join(df_conditions, by = "sample_name") %>%
+  mutate(replicate = fct_inseq(factor(replicate)))
 
 plot_read_length <- df_read_lengths %>%
-  group_by(name) %>%
+  group_by(condition, replicate) %>%
   mutate(read_count = read_count / sum(read_count) * 100) %>%
   ggplot(aes(
     x = as.numeric(read_length),
-    y = read_count
+    y = read_count,
+    color = condition,
+    linetype = replicate
   )) +
   geom_step() +
   labs(
     x = "read length", y = "frequency (% total reads)",
-    title = "Read lengths distribution, per sample"
+    title = "Read lengths distribution, per condition",
+    subtitle = "color: condition, line type: replicate"
   ) +
   custom_theme() +
-  facet_wrap(~name) +
+  facet_wrap(~condition) +
   scale_color_manual(values = custom_colors)
 
 if (export_figures) {
@@ -599,26 +619,27 @@ print(plot_orfs_anno)
 ```{r, echo = FALSE, warning = FALSE}
 df_feat_long <- df_features %>%
   dplyr::select(
-    group_name, sample_name, countRFP, cpmRFP, fpkmRFP, floss,
+    group_name, condition, replicate, countRFP, cpmRFP, fpkmRFP, floss,
     entropyRFP, disengagementScores, RRS, RSS, ORFScores, ioScore,
     startCodonCoverage, startRegionCoverage, startRegionRelative
   ) %>%
-  mutate(across(!c("group_name", "sample_name"), ~ log10(abs(.x)))) %>%
-  pivot_longer(cols = !c("group_name", "sample_name"), names_to = "feature", values_to = "value")
+  mutate(across(!c("group_name", "condition", "replicate"), ~ log10(abs(.x)))) %>%
+  pivot_longer(cols = !c("group_name", "condition", "replicate"), names_to = "feature", values_to = "value")
 ```
 
 ```{r, echo = FALSE, warning = FALSE, fig.width = figwidth, fig.height = figwidth}
 plot_data_features <- df_feat_long %>%
   filter(!is.infinite(value), !is.na(value)) %>%
-  group_by(sample_name, feature) %>%
+  group_by(condition, feature, group_name) %>%
+  summarize(value = mean(value, na.rm = TRUE)) %>%
   mutate(bins = cut_interval(value, n = 40, labels = FALSE)) %>%
   group_by(bins, .add = TRUE) %>%
   summarize(freq = n(), .groups = "drop") %>%
-  ggplot(aes(x = bins, y = freq, color = sample_name)) +
+  ggplot(aes(x = bins, y = freq, color = condition)) +
   geom_step() +
   labs(
     title = "Distribution of log10 transformed features",
-    subtitle = "x-axis is rescaled for compatibility",
+    subtitle = "mean of replicates; x-axis is rescaled for compatibility",
     x = "log10 normalized score", y = "frequency"
   ) +
   custom_theme(aspect.ratio = 1, legend.position = "bottom") +
@@ -655,8 +676,10 @@ df_fpkm <- df_features %>%
   pivot_wider(id_cols = group_name, names_from = sample_name, values_from = fpkmRFP) %>%
   dplyr::select(-group_name)
 
-plot_sample_corr <- ggpairs(df_fpkm) +
+plot_sample_corr <- ggpairs(df_fpkm, aes(color = "")) +
   labs(title = "Correlation of log10 transformed FPKM") +
+  scale_color_manual(values = alpha(custom_colors, 0.7)) +
+  scale_fill_manual(values = alpha(custom_colors, 0.5)) +
   custom_theme()
 
 if (export_figures) {
@@ -670,6 +693,55 @@ if (export_figures) {
 print(plot_sample_corr)
 ```
 
+#### Variation of read counts between replicates {-}
+
+- this figure shows a histogram of the coeffcient of variation (CV, or relative standard deviation)
+- the CV was calculated based on all replicates of a sample
+- a distribution where the majority of genes have a CV below 25% can be considered good
+- if no replicates were used, this step is omitted
+
+```{r, echo = FALSE, warning = FALSE, fig.width = figwidth, fig.height = figwidth}
+plot_cv <- df_features %>%
+  dplyr::select(group_name, condition, replicate, fpkmRFP) %>%
+  filter(fpkmRFP != 0) %>%
+  group_by(group_name, condition) %>%
+  summarize(
+    mean_fpkm = mean(fpkmRFP, na.rm = TRUE),
+    sd_fpkm = sd(fpkmRFP),
+    cv_fpkm = sd_fpkm / mean_fpkm
+  ) %>%
+  ungroup() %>%
+  mutate(
+    bins = cut_width(cv_fpkm, width = 0.05),
+    bins = as.numeric(str_extract(bins, "[0-9]*\\.[0-9]*"))
+  ) %>%
+  group_by(condition, bins, .add = TRUE) %>%
+  summarize(freq = n(), .groups = "drop") %>%
+  ggplot(aes(x = bins, y = freq, color = condition)) +
+  geom_step(show.legend = FALSE) +
+  labs(
+    title = "Distribution of coefficient of variation (CV)",
+    x = "CV", y = "frequency"
+  ) +
+  custom_theme(aspect.ratio = 1, legend.position = "bottom") +
+  facet_wrap(~condition) +
+  theme(axis.text.x = element_blank()) +
+  scale_color_manual(values = rep_len(
+    custom_colors,
+    length.out = length(unique(df_features$condition))
+  ))
+
+if (export_figures) {
+  save_plot(
+    plot_sample_corr,
+    path = export_plots,
+    width = figwidth, height = figwidth
+  )
+}
+
+print(plot_cv)
+```
+
 #### Principal component analysis {-}
 
 - this figure shows the correlation between samples/replicates for selected scores
@@ -691,16 +763,22 @@ if (!"PC3" %in% colnames(df_pca)) {
   df_pca$PC3 <- 1
 }
 
+df_pca <- left_join(
+  df_pca,
+  df_conditions,
+  by = "sample_name"
+)
+
 plot_pca <- df_pca %>%
   ggplot(aes(
-    x = PC1, y = PC2, size = PC3, color = sample_name,
-    label = sample_name
+    x = PC1, y = PC2, size = PC3, color = condition,
+    label = replicate
   )) +
   geom_point() +
-  geom_text_repel(size = 3, point.padding = 10) +
+  geom_text_repel(size = 3, point.padding = 10, show.legend = FALSE) +
   labs(
     title = "PCA of log10 transformed FPKM values",
-    subtitle = "point size encodes PC3, color encodes sample"
+    subtitle = "point size encodes PC3, color encodes condition, point label is replicate"
   ) +
   custom_theme(aspect.ratio = 1, legend.position = "bottom") +
   scale_color_manual(values = rep_len(
@@ -742,7 +820,7 @@ For all other feedback, contact the author(s).
   - ORCID profile: https://orcid.org/0000-0002-0656-1795
   - github page: https://github.com/rabioinf
 
-Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.
+Visit the MPUSP github page at https://github.com/MPUSP for more info about this workflow and other projects.
 
 ## Data accessability
 
diff --git a/workflow/rules/postprocessing.smk b/workflow/rules/postprocessing.smk
index cd17f42..c6e6cf1 100644
--- a/workflow/rules/postprocessing.smk
+++ b/workflow/rules/postprocessing.smk
@@ -74,6 +74,7 @@ rule shift_reads:
 # -----------------------------------------------------
 rule feature_stats:
     input:
+        samples=config["samplesheet"],
         fasta=rules.get_genome.output.fasta,
         gff_rna=rules.extract_features.output.gff,
         gff_cds=rules.extract_mRNA_features.output.gff,
diff --git a/workflow/scripts/feature_stats.R b/workflow/scripts/feature_stats.R
index e8026d6..caea5c9 100644
--- a/workflow/scripts/feature_stats.R
+++ b/workflow/scripts/feature_stats.R
@@ -20,6 +20,7 @@ suppressWarnings({
 
 # import genome sequence
 messages <- c("importing genome sequence and annotation")
+samplesheet <- snakemake@input[["samples"]]
 genome_fasta <- snakemake@input[["fasta"]]
 genome_gff_rna <- snakemake@input[["gff_rna"]]
 genome_gff_cds <- snakemake@input[["gff_cds"]]
@@ -105,6 +106,18 @@ df_stats <- compute_features(
 ) %>%
   as_tibble()
 
+# add sample and replicate information to feature table
+df_stats <- mutate(df_stats, sample_name = str_remove(sample_name, "\\.bam$"))
+df_samples <- read_tsv(samplesheet, show_col_types = FALSE) %>%
+  dplyr::rename(sample_name = sample) %>%
+  dplyr::select(sample_name, condition, replicate)
+df_stats <- left_join(
+  df_stats, df_samples,
+  by = "sample_name"
+)
+df_stats <- df_stats %>%
+  relocate(seqnames, group_name, sample_name, condition, replicate)
+
 
 # EXPORT RESULTS
 # ------------------------------