Code to perform analyses on the 16S rRNA data of the Bangladeshi infant gut microbiota project (Files to be uploaded soon!)
For questions please contact Maria Ioanna Papadaki (papadaki.mg@gmail.com)
This repository contains the analysis scripts for our study on infant gut microbiota development in a Bangladeshi infant population (BBGUT cohort) and how it compares to a Belgian infant cohort (BABEL cohort) during early life. The STORMS checklist for this study is also provided. Each script is organized to guide you through the downstream analysis and visualization of results. All main figures and supplementary data can be reproduced with the following scripts:
Function: Exploration of the healthy gut microbiota development in Bangladeshi infants:
- DMM
- GMMs
- Alpha diversity
- Composition
- Covid-19
Data generated:
- Figure 1a,1b,1c,1e
- Figure 4a,b
- Supplementary Figure 5a
- Supplementary Figure 6
- Supplementary Figure 12
- Supplementary Table 1
- Supplementary Table 9
Function: Alpha diversity and Supplementary figures (BBGUT cohort)
Data generated:
- Supplementary Figure 5b
- Supplementary Figure 10
Function: ASV saturation plot with increasing sample size (# infants)
Data generated:
- Supplementary Figure 1a,b
Function: Alpha diversity across GMMs (observed, shannon) LMM model
Data generated:
- Supplementary Table 2
Function: Alpha diversity over time (observed, shannon) LMM model
Data generated:
- Supplementary Table 2
Function: Normalized 2-month bacterial abundance changes across periods (P-P, P-L, L-L)
Data generated:
- Supplementary Table 8
Function: BBGUT – BABEL cohort comparison
- DMM (for BABEL, for BBGUT use DMM results from full dataset)
- Alpha diversity (comparison)
- Beta diversity (comparison)
Data generated:
- Figure 2c
- Supplementary Table 3
Function: BBGUT – BABEL comparison
- Prevalence changes of common bacterial genera over time
- Alpha diversity (comparison)
Data generated:
- Figure 2a
- Supplementary Table 4
Function: BBGUT – BABEL comparison
- Top15 most abundant genera per cohort during year 1
- Order of Appearance of most abundant genera (y1)
- Rank (order of appearance) correlations between the two cohorts (y1)
- Bacterial prevalence correlations between the two cohorts
- Differential abundances
Data generated:
- Figure 2b, d
- Supplementary Figure 7a, b
- Supplementary Figure 8a, b
Function: PCoA, dbRDA analysis in BBGUT cohort
- genus level PCoA analysis for the 3 GMM stages
- Distance-based Redundancy Analysis on for metadata covariates
- PcoA infant and maternal samples
Data generated:
- Figure 1a
- Figure 3a
- Supplementary Table 5
Function: Finding covariates that explain the variation of the most abundant bacterial genera dbRDA analysis in BBGUT cohort
Data generated:
- Figure 3b
- Supplementary Table 6
Function: Maturation setbacks in the BBGUT cohort
- Setbacks in BBGUT cohorts (all changes)
- Calculate maturation score
- Setback association to disease events
Data generated:
- Figure 3c
- Figure 3d
- Figure 4c
- Supplementary Figure 9a, b
- Supplementary Table 7
Function: Exploring Segatella in the BBGUT cohort
- Segatella prevalence across GMM stages and health groups
- HSA threshold calculation
- Monthly proportion of HSA in adhoc disease and HTP samples
- HSA metadata
- Heatmap for samples with HSA spikes per child over time
Data generated:
- Figure 5 a, b, c
- Supplementary Figure 4a, b
- Supplementary Figure 13a, b
- Supplementary Table 10
Function: Segatella ASV abundance across health groups
Data generated:
- Supplementary Figure 14
Folder containing the data for running all scripts of the main analysis
Scripts used to create the final ASV table using the pre-processed raw data
Function: OTU Annotation with RDP taxonomy (rdp set 19)
Function: Decontamination using decontam R package
Function: Create final phyloseq object