GitHub - Lfulcrum/mbiotaDB: A Python toolbox for creating databases of microbiota count data

mbiotadb

This repository contains a set of software tools to support meta-analyses of 16S rRNA gene survey count data from studies of the human microbiota.

These tools facilitate the creation of a relational database of microbial count data and associated metadata. The design of these tools focused on times series 16S rRNA gene survey count data (and associated metadata), but it could be adapted to work with count data from other gene markers and studies that do not have time series data.

The format of input data is based on that available from studies found on Qiita (free account registration required). Data of a similar format from other data sources should work.

All tools are written in Python. PostgreSQL is used as the relational database management system (RDBMS). SQLAlchemy's object-relational mapper (ORM) is used to perform database manipulations.

Inventory of code

Here is a brief overview of the different software and data components that can be found in this repository:

main.py: A script used to informally test various parts of data cleaning, parsing and database functionality. Functions in this script can be used to insert (Qiita-derived) studies into
model.py: The current SQLAlchemy model for the PostgreSQL database.
config.py: A small script that parses database and Qiita configuration from database.ini.
creator/: A package containing tools to parse data into appropriate objects and create/manipulate database tables and entries. Newer implementations of some scripts found in this file can be found in the wip/ package, but still need to be fully integrated with the rest of the system.
creator/bib_parser.py: A script to parse bibliographic information from XML files (downloaded from Qiita).
creator/count_parser.py: A script to parse count, lineage and sequence variant (ASV) data found in BIOM files into Count objects.
creator/prep_parser.py: A script to parse sample preparation and processing metadata from data files.
creator/sample_parser.py: A script to parse sample and subject metadata from data files.
creator/transact.py: Utility script to create and remove tables from the database.
creator/csv_cleaner.py: Utility script to clean data from CSV files containing sample, subject and preparation metadata.
downloader/qiita_downloader.py: A web scraper to search Qiita, collect data files, scrape processing metadata and download bibliographic data for studies of interest. This script has been adapted for command-line use and is independent of any functionality in other code in this repository. For further information, see downloader/README.md.
test/: A package to support testing of various software components (for use with pytest).
data/: A directory containing example input data (organised by type). These data are sometimes used in tests (found in test/).
debug_tools/: Tools that may be helpful in debugging some scripts (e.g. to inspect files containing heterogeneous metadata and for parser profiling).
wip/: Package containing code that is work in progress (wip).

Dependencies and Installation

There is currently no way to install code in this repository as a Python package. A user is advised to create a virtual environment, e.g. using conda, and install the following Python packages before testing functionality of the provided scripts:

selenium
biom-format
pandas
sqlalchemy
pint
biopython
networkx

To use the Qiita Downloader, only the selenium package is required. For further details please refer to Qiita Downloader documentation (downloader/README.md).

To execute test scripts, pytest is also required.

TODO

Integrate perturbation fact parsing and population.
Integrate wip code.

This code was written to accompany a master's thesis and is currently maintained by William Roberts-Sengier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mbiotadb

Inventory of code

Dependencies and Installation

TODO

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
creator		creator
data/test_data		data/test_data
debug_tools		debug_tools
downloader		downloader
test		test
wip		wip
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
config.py		config.py
database.ini		database.ini
main.py		main.py
model.py		model.py

License

Lfulcrum/mbiotaDB

Folders and files

Latest commit

History

Repository files navigation

mbiotadb

Inventory of code

Dependencies and Installation

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages