Skip to content

malariagen/vector_gwas_exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GSoC 2025: Cloud-Native Tools for Detecting Insecticide Resistance in Malaria Mosquitoes

Organization Wellcome Sanger Institute, Tree of Life
Project Google Summer of Code 2025
Contributor Mohamed Laarej
Mentors Jon Brenas, Anastasia Hernandez-Koutoucheva, Chris Clarkson
GSoC Report Final Report

Project Overview

This repository contains the complete toolkit developed during the Google Summer of Code 2025 for enhancing the detection of novel insecticide resistance in Anopheles mosquitoes. The project's core deliverable is a two-phase analytical pipeline designed to navigate the complexities of genomic data with strong population structure.

The pipeline consists of:

  1. A sensitive GWAS scan to cast a wide net and identify all potential resistance-associated SNPs.
  2. A rigorous verification phase using a suite of advanced statistical models (Logistic Regression, Mixed-Effects, and Bayesian) to filter out false positives caused by confounding.
  3. An interactive visualization dashboard built with Bokeh to allow researchers to intuitively explore the results from a genome-wide scale down to a single SNP.

A key scientific finding of this project was that confounding from population structure can produce spurious signals that are even stronger than true biological signals, which powerfully validates the necessity of this two-phase design.

Repository Structure

gsoc-malaria-ir-detection/
├── data/ # Processed data and simulation results (excluded from Git)
├── notebooks/ # Exploratory and implementation notebooks
│ ├── 01_simulation.ipynb
│ ├── 02_mixed_effects_modeling.ipynb
│ └── ...
├── output/ # Generated HTML dashboards and other outputs
├── src/
│ ├── init.py
│ ├── data/ # Data loading & simulation
│ ├── models/ # Statistical model implementations (e.g., mixed models, Bayesian)
│ ├── viz/ # Dashboard builder script
| ├── analysis/ # GWAS analyses and pipelines
│    └── gwas/
│        └── ...
│ └── utils/ # Shared helpers
├── tests/ # Unit tests for core modules
├── poetry.lock # Dependency lockfile
├── pyproject.toml # Poetry dependency config
├── README.md
└── .gitignore

Installation Instructions

Prerequisites

Setup

# Clone the repository
git clone [email protected]:malariagen/vector_gwas_exploration.git
cd vector_gwas_exploration

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

Running Notebooks

jupyter notebook

Example Usage

Step 1: Generate the mock data by running the notebook

# (Inside the poetry shell, start Jupyter)
jupyter notebook
# -> Now, open and run `notebooks/08_generate_mock_data.ipynb`

Step 2: Build the standalone HTML dashboard

# (From your terminal, still in the poetry shell)
python src/viz/build_explorer.py

Step 3: View the result

# Open the newly created file in your web browser:
# output/gwas_explorer.html

Progress Tracking

  • Phenotype Loading: Contributed functions to the main malariagen_data API.
  • Data Simulation: Built a ResistanceSimulator to create realistic test data with confounding.
  • Statistical Models: Implemented a suite of three verification models (Logistic, Mixed-Effects, Bayesian).
  • GWAS Scanner: Built and validated a robust Chi-squared scanner.
  • Validation: Performed rigorous positive and negative control tests, uncovering the strong effect of population structure.
  • CI/CD: Set up a GitHub Actions pipeline with black and ruff for code quality.
  • Interactive Dashboard: Built a complete, three-panel dashboard prototype with Bokeh.
  • Full Genome Scan: The final computational run to generate the real dataset for the dashboard is the next step.
  • Hierarchical Bayesian Model: Implementation of a more advanced model to control for confounding is a key piece of future work.
  • Cloud packaging (Docker)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •