Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding QIIME2 plugin into FEAST_beta #8

Open
wants to merge 75 commits into
base: FEAST_beta
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
26c8690
prevent writing out proportions. Should be written in wrappers.
cameronmartino Sep 17, 2019
143b4d7
add the FEAST citation
cameronmartino Sep 18, 2019
6eb25a9
add init
cameronmartino Sep 18, 2019
e211d04
add barebones method for source tracking
cameronmartino Sep 18, 2019
1569ab9
add R script for FEAST st
cameronmartino Sep 18, 2019
fddec96
add defaults from file
cameronmartino Sep 18, 2019
0676805
defaults and desc. for each variable
cameronmartino Sep 18, 2019
1aadf8e
q2 plugin setup intial
cameronmartino Sep 18, 2019
66130dd
add missing imports and hard-coded version
cameronmartino Sep 18, 2019
85787bf
add plugin descp
cameronmartino Sep 18, 2019
8cecca7
q2 setup files
cameronmartino Sep 18, 2019
7132fde
rm import all
cameronmartino Sep 18, 2019
09169fc
semi-working install plut base test files
cameronmartino Sep 18, 2019
fbec57b
fix few naming bugs
cameronmartino Sep 18, 2019
40d3c51
fix minor bugs
cameronmartino Sep 18, 2019
13c62c3
clean up
cameronmartino Sep 18, 2019
aae9fb7
add testing data
cameronmartino Sep 18, 2019
e596df4
convert to flake8 specs
cameronmartino Sep 18, 2019
ad8f0ac
flake8 and test passing updates
cameronmartino Sep 18, 2019
ea80750
update expexted
cameronmartino Sep 18, 2019
f40e51e
add citations
cameronmartino Sep 18, 2019
0b8bd2d
add make file and manifest for pip
cameronmartino Sep 18, 2019
a1c78ee
add q2 readme
cameronmartino Sep 19, 2019
8fc04c5
update sub-script name
cameronmartino Sep 19, 2019
4ee84a1
typo in readme
cameronmartino Sep 19, 2019
b50b8aa
update feast R script
cameronmartino Sep 28, 2019
3ba2eda
fix bad link
cameronmartino Sep 28, 2019
bfd568c
Update README.md
liashenhav Sep 28, 2019
991482f
Update README.md
liashenhav Sep 28, 2019
32acaa8
Update README.md
liashenhav Sep 28, 2019
c8d5196
Update README.md
liashenhav Sep 28, 2019
9efabce
Update README.md
liashenhav Sep 28, 2019
04eae17
Update README.md
liashenhav Sep 28, 2019
34895c1
Update README.md
liashenhav Sep 28, 2019
c86bf73
Update README.md
liashenhav Sep 28, 2019
34d4788
Update README.md
liashenhav Sep 28, 2019
24e6be5
move data
cameronmartino Sep 28, 2019
cd211d0
fix plugin name
cameronmartino Sep 28, 2019
9f4ed29
update tutorial with viz and tests
cameronmartino Sep 29, 2019
154ad25
update
cameronmartino Sep 29, 2019
3613c9b
fix readme
cameronmartino Sep 29, 2019
e0de0d7
update tutorial name
cameronmartino Sep 29, 2019
33e88db
Delete DIABIMMUNE.md
cameronmartino Sep 29, 2019
c88b3f3
add backhad
cameronmartino Sep 29, 2019
44cd9eb
Merge branch 'FEAST_beta' of https://github.com/cameronmartino/FEAST …
cameronmartino Sep 29, 2019
b3233ee
DIABIMMUNE heatmap delete
cameronmartino Sep 29, 2019
ad1f264
DIABIMMUNE meta delete
cameronmartino Sep 29, 2019
b3b9a23
DIABIMMUNE prop delete
cameronmartino Sep 29, 2019
12e8b81
DIABIMMUNE table delete
cameronmartino Sep 29, 2019
55f1677
fix link
cameronmartino Sep 29, 2019
5c2defb
Merge branch 'FEAST_beta' of https://github.com/cameronmartino/FEAST …
cameronmartino Sep 29, 2019
83493e4
fix link
cameronmartino Sep 29, 2019
d0b34cf
add barplot wrapper
cameronmartino Oct 20, 2019
279c193
update output format for mixing proportions
cameronmartino Oct 20, 2019
a978500
updae tutorial data
cameronmartino Oct 20, 2019
7ed171f
update tutorial text
cameronmartino Oct 20, 2019
61103f5
update readme
cameronmartino Oct 20, 2019
a56239d
Update setup.cfg
liashenhav Oct 29, 2019
f8889ed
fix case issue with q2 dir
cameronmartino Oct 30, 2019
1becf9b
DOC: slight formatting fix in README
fedarko Jan 29, 2020
4bb0aca
DOC: Rename example metadata from QZA to TSV file
fedarko Jan 29, 2020
cbb5d4e
BUG/TST: update q2 test re: renamed metadata file
fedarko Jan 29, 2020
bdea423
BUG: Remove .o and .so files from src/
fedarko Jan 29, 2020
8f6edc1
TST/BUG: Fix case sensitivity in q2-FEAST tests
fedarko Jan 29, 2020
4bb943d
BUG: also fix case sensitivity thing in setup.cfg
fedarko Jan 29, 2020
bbb3a51
DEV: make sure that needed test pkgs are installed
fedarko Jan 29, 2020
80be134
DEV: add tentative developer docs
fedarko Jan 29, 2020
8fc6b72
DOC: clean up and reorganize q2-feast readme
fedarko Jan 29, 2020
283d595
MNT: add gitignore; remove empty lines from README
fedarko Jan 29, 2020
80a7a87
DOC: fix up formatting/etc in q2-feast readme
fedarko Jan 29, 2020
94132fd
DOC: indent code in dev docs in readme
fedarko Jan 29, 2020
3e33d02
DOC: remove extraneous trailing spaces
fedarko Jan 29, 2020
c59f82e
DEV/DOC: simplify + add context to dev setup docs
fedarko Jan 29, 2020
11e6a45
Merge pull request #1 from fedarko/FEAST_beta
cameronmartino Feb 21, 2020
00f5384
fix metadata big when no shared IDs
cameronmartino Aug 19, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# a few things we don't want to include in the repo -- based on github's
# default .gitignore for python repos
.coverage
__pycache__/
*.egg-info/
3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
include README.md
include COPYING.txt
include q2_FEAST/citations.bib
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.PHONY: test

test:
nosetests -v -s q2_feast --with-coverage --cover-package=q2_feast
9 changes: 1 addition & 8 deletions R/FEAST.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@
#' @param COVERAGE A numeric value indicating the rarefaction depth (default = minimal sequencing depth within each group of sink
#' and its corresponding sources).
#' @param different_sources_flag A boolian value indicating the source-sink assignment.
#' @param dir_path A path to an output .txt file.
#' @param outfile the prefix for saving the output file.
#' different_sources_flag = 1 if different sources are assigned to each sink , otherwise = 0.
#' @return P - an \eqn{S1} by \eqn{S2} matrix, where \eqn{S1} is the number sinks and \eqn{S2}
#' is the number of sources (including an unknown source). Each row in matrix \eqn{P} sums to 1.
#' \eqn{Pij} is the contribution of source j to sink i.
Expand All @@ -41,8 +38,7 @@
#' }
#'
#' @export
FEAST <- function(C, metadata, EM_iterations = 1000, COVERAGE = NULL ,different_sources_flag,
dir_path, outfile){
FEAST <- function(C, metadata, EM_iterations = 1000, COVERAGE = NULL ,different_sources_flag){

###1. Parse metadata and check it has the correct hearer (i.e., Env, SourceSink, id)
if(sum(colnames(metadata)=='Env')==0) stop("The metadata file must contain an 'Env' column naming the source environment for each sample.")
Expand Down Expand Up @@ -143,9 +139,6 @@ FEAST <- function(C, metadata, EM_iterations = 1000, COVERAGE = NULL ,different_
colnames(proportions_mat) <- c(envs_sources, "Unknown")
rownames(proportions_mat) <- envs_sink
# proportions_mat[is.na(proportions_mat)] <- 999

setwd(dir_path)
write.table(proportions_mat, file = paste0(outfile,"_source_contributions_matrix.txt"), sep = "\t")
return(proportions_mat)

}
5 changes: 0 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ Software Requirements and dependencies
Packages <- c("Rcpp", "RcppArmadillo", "vegan", "dplyr", "reshape2", "gridExtra", "ggplot2", "ggthemes")
install.packages(Packages)
lapply(Packages, library, character.only = TRUE)

```


Expand Down Expand Up @@ -147,7 +146,3 @@ Output -
| infant gut 2 |Adult gut 1 | Adult gut 2| Adult gut 3| Adult skin 1 | Adult skin 2| Adult skin 3| Soil 1 | Soil 2 | unknown|
| ------------- | ------------- |------------- |------------- |------------- |------------- |------------- |------------- |------------- |------------- |
| 5.108461e-01 | 9.584116e-23 | 4.980321e-12 | 2.623358e-02|5.043635e-13 | 8.213667e-59| 1.773058e-10 | 2.704118e-14 | 3.460067e-02 | 4.283196e-01 |




97 changes: 97 additions & 0 deletions q2_feast/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# FEAST QIIME 2 Plugin


One critical challenge in analyzing microbiome communities is due to their composition; each of them is typically comprised of several source environments, including different contaminants as well as other microbial communities that interacted with the sampled habitat. To account for this structure, we developed FEAST (Fast Expectation-mAximization microbial Source Tracking), a ready-to-use scalable framework that can simultaneously estimate the contribution of thousands of potential source environments in a timely manner, thereby helping unravel the origins of complex microbial communities. Specifically, FEAST is quantifying the fraction, or proportion, of different microbial samples (sources) in a target microbial community (sink), by leveraging its structure and measuring the respective similarities between a sink community and potential source environments. For more details see Shenhav et al., Nature Methods 2019 (https://www.nature.com/articles/s41592-019-0431-x).

## Installation

If you have not already done so, activate your QIIME environment.

```shell
source activate qiime2-20xx.x
```
Next we will need to ensure some dependancies are installed.

```shell
conda install -c bioconda -c conda-forge -c r bioconductor-phyloseq r-devtools r-magrittr r-dplyr r-vgam r-tidyr r-vegan r-reshape2 r-rcpp r-rcpparmadillo r-gridextra r-ggplot2 r-ggthemes
```

Now we will install FEAST and the q2-plugin.

```R
# the main FEAST package
> R
> devtools::install_github("cozygene/FEAST")
> quit()
```
```shell
# the QIIME2 plugin
pip install git+https://github.com/cozygene/FEAST.git
```

## Tutorial

A QIIME2 tutorial is available [here](https://github.com/cozygene/FEAST/q2_feast/tutorials/DIABIMMUNE.md)

## The q2-FEAST commands

The QIIME 2 implementation of FEAST contains two steps.

1. The first step, called `microbialtracking`, performs the tracking and
outputs a table of mixing proportions of the semantic type
`FeatureTable[Frequency]`.

2. The second command, `barplot`, takes the output from the previous step and
creates an interactive stacked barplot of source-contributions to each sink.

Note that this command just creates a visualization from a single
mixing proportions table. However, if you have lots of samples in your
proportions table, you can create multiple visualizations by splitting up
your proportions table using the `qiime feature-table filter-samples` command
in QIIME 2.

![](tutorials/etc/backhed-barplot.png)

## Setting up a development environment for q2-FEAST

This section contains instructions on how to set up a development environment
of q2-FEAST, at least as of writing.

1. Activate your QIIME 2 conda environment.

2. Fork this git repository, then clone your fork to your system.

3. Install the R dependencies as shown in the installation instructions at the
top of this file (`conda install -c bioconda ...`).
However, don't install FEAST using `devtools::install_github()` quite yet.

4. Using your favorite shell (e.g. bash), navigate into the folder this fork
was installed into (the folder that contains this `README.md`).

5. Open up R, then run the
[following command](https://stackoverflow.com/a/34513358/10730311):
```r
> devtools::install()
```
This will install the FEAST R package from the current directory.
If you get prompted to update package versions, it's probably fine to do
that.

6. We've installed FEAST's R prerequisites, as well as the actual FEAST R
package, but we haven't installed the Python infrastructure needed to
get this working with QIIME 2 yet. Let's do that!

Exit out of R back to the shell. Run the following command:
```bash
$ pip install -e .[dev]
```
This will install the q2-FEAST package from the current directory, along
with its `dev` requirements (which are needed to run its tests).

7. We're almost done! Run the following commands to test that FEAST and
q2-FEAST are properly installed:
```bash
$ qiime dev refresh-cache
$ make test
```
If these commands succeed, you should be good to start developing!
Empty file added q2_feast/__init__.py
Empty file.
35 changes: 35 additions & 0 deletions q2_feast/_feast_defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Configuration file where you can set the parameter default values and
# descriptions.
DEFAULT_SHARED = None
DEFAULT_EMITR = 1000
DEFAULT_DIFFS = True

DESC_META = ('Sample metadata file containing sources'
' and sinks for source tracking.')
DESC_TBL = ('Feature table file containing sources'
' and sinks for source tracking.')
DESC_MP = ('The mixing proportions returned from FEAST.'
' The mixing proportions table is an S1 by '
'S2 matrix P, where S1 is the number sinks '
'and S2 is the number of sources (including '
'an unknown source). Each row in matrix P sums'
' to 1. Pij is the contribution of source j to '
'sink i. If Pij == NA it indicates that source '
'j was not used in the analysis of sink i.')
DESC_ENVC = ('Sample metadata column with a description '
'of the sampled environment (e.g., human gut).')
DESC_SSC = ('Sample metadata column with labels for source or a sink.'
'All the sub-classes in this column must be in'
' either source_ids or sink_ids.')
DESC_SOURCEID = (
'Comma-separated list (without spaces) of class ids '
'contained in source_sink_column to be considered as sources.')
DESC_SINKID = ('Comma-separated list (without spaces) of class ids '
'contained in source_sink_column to be considered as sinks.')
DESC_SHARED = ('Sample metadata column with the Sink-Source id.'
' When using multiple sinks, each tested with the '
'same group of sources')
DESC_EMITR = ('A numeric value indicating the number of EM iterations.')
DESC_DIFFS = ('A Boolean value indicating the source-sink assignment.'
'Different-sources is True if different sources are assigned'
'to each sink, otherwise different-sources should be False.')
178 changes: 178 additions & 0 deletions q2_feast/_method.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
import os
import tempfile
import subprocess
import pandas as pd
from qiime2 import Metadata
from ._feast_defaults import (DEFAULT_DIFFS,
DEFAULT_EMITR)


def run_commands(cmds, verbose=True):
"""
This function is a script runner.
It was obtained from https://github.com/ggloor
/q2-aldex2/blob/master/q2_aldex2/_method.py
"""
if verbose:
print("Running external command line application(s). This may print "
"messages to stdout and/or stderr.")
print("The command(s) being run are below. These commands cannot "
"be manually re-run as they will depend on temporary files that "
"no longer exist.")
for cmd in cmds:
if verbose:
print("\nCommand:", end=' ')
print(" ".join(cmd), end='\n\n')
subprocess.run(cmd, check=True)


def feast_format(fmeta: pd.DataFrame,
source_sink_column: str,
source_ids: list,
sink_ids: list,) -> pd.DataFrame:
"""
Helper function to format metadata for FEAST.
"""

# ensure that all sub-cats in SourceSink are represented
missing_ = list(set(fmeta[source_sink_column])
- set(source_ids + sink_ids))
if len(missing_) > 0:
raise ValueError(('All of the sub-classes of %s must'
' be given as a source or sink'
'the sub-class(es) [%s] are missing.')
% (str(source_sink_column),
', '.join(map(str, missing_))))
# rename ids in source and sink columns (only if all rep.)
rename_ = {**{id_: 'Source' for id_ in source_ids},
**{id_: 'Sink' for id_ in sink_ids}}
fmeta[source_sink_column].replace(to_replace=rename_,
inplace=True)

return fmeta


def microbialtracking(table: pd.DataFrame,
metadata: Metadata,
environment_column: str,
source_sink_column: str,
source_ids: list,
sink_ids: list,
shared_id_column: str,
em_iterations: int = DEFAULT_EMITR,
different_sources: bool = DEFAULT_DIFFS) -> pd.DataFrame:

# split the ids used for sources and sinks
source_ids = source_ids.split(",")
sink_ids = sink_ids.split(",")

# create metadata formatted for FEAST
# check if there are shared ids.
# currently FEAST requires an id
# column but in future versions it will
# be an optional peram.
if shared_id_column is not None:
keep_cols = [environment_column,
source_sink_column,
shared_id_column]
rename_cols = ['Env', 'SourceSink', 'id']
else:
keep_cols = [environment_column,
source_sink_column]
rename_cols = ['Env', 'SourceSink']

# import and check all columns given are in dataframe
metadata = metadata.to_dataframe()
# replace seperation character in metadata
metadata = metadata.replace('_', '-',
regex=True)
metadata.index = metadata.index.astype(str)
metadata.index = [ind.replace('_', '-')
for ind in metadata.index]
# check columns are in metadata
if not all([col_ in metadata.columns for col_ in keep_cols]):
raise ValueError('Not all columns given are present in the'
' sample metadata file. Please check that'
' the input columns are in the given metdata.')

# keep only those columns
feast_meta = metadata.dropna(subset=keep_cols)
feast_meta = feast_meta.loc[:, keep_cols]

# filter the metadata & table so they are matched
table = table.T
shared_index = list(set(table.columns) & set(feast_meta.index))
feast_meta = feast_meta.reindex(shared_index)
table = table.loc[:, shared_index]

# format the sub-classes for source-sink
feast_meta = feast_format(feast_meta,
source_sink_column,
source_ids,
sink_ids)
if shared_id_column is not None:
# encode the shared SourceSink id column
# with numerics ranging from 1-N
shared_ = set(metadata[shared_id_column])
rename_ = {id_: str(int(i) + 1) for i, id_ in enumerate(shared_)}
feast_meta[shared_id_column].replace(to_replace=rename_,
inplace=True)
if not different_sources or shared_id_column is None:
# get source-sink
source_index = feast_meta[feast_meta[source_sink_column] == 'Source'].index
sink_index = feast_meta[feast_meta[source_sink_column] == 'Sink'].index
# set sources IDs to NA
feast_meta.loc[source_index, shared_id_column] = 'NA'
# give sinks IDs (each sink gets a different ID)
for i, sid_ in enumerate(sink_index):
feast_meta.loc[sid_, shared_id_column] = str(int(i) + 1)
#shared_ = set(metadata.loc[sink_index, shared_id_column].values)
#rename_ = {id_: str(int(i)) for i, id_ in enumerate(shared_)}
#feast_meta[shared_id_column].replace(to_replace=rename_,
# inplace=True)

# make sure that if no "shared_id_column" supplied IDs not used
different_sources = False

# rename those columns for FEAST
feast_meta.columns = rename_cols

# if there are different sources
if different_sources:
different_sources = 1
else:
different_sources = 0

# save all intermediate files into tmp dir
with tempfile.TemporaryDirectory() as temp_dir_name:
# save the tmp dir locations
biom_fp = os.path.join(temp_dir_name, 'input.tsv')
map_fp = os.path.join(temp_dir_name, 'input.map.txt')
summary_fp = os.path.join(temp_dir_name, 'output.proportions.txt')

# Need to manually specify header=True for Series (i.e. "meta"). It's
# already the default for DataFrames (i.e. "table"), but we manually
# specify it here anyway to alleviate any potential confusion.
table.to_csv(biom_fp, sep='\t', header=True)
feast_meta.to_csv(map_fp, sep='\t', header=True)

# build command for FEAST
cmd = ['source_tracking.R',
biom_fp,
map_fp,
different_sources,
summary_fp]
cmd = list(map(str, cmd))

try:
run_commands([cmd])
except subprocess.CalledProcessError as e:
raise Exception("An error was encountered while running FEAST"
" in R (return code %d), please inspect stdout"
" and stderr to learn more." % e.returncode)

# if run was sucessfull import the data and return
proportions = pd.read_csv(summary_fp, index_col=0).T
proportions.index.name = "sampleid"

return proportions
Loading