Skip to content

Commit

Permalink
Fixed a bug with kinship in lmm and updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Elior Rahmani authored and Elior Rahmani committed Jun 25, 2017
1 parent 01d23fc commit 02af6e3
Show file tree
Hide file tree
Showing 16 changed files with 33 additions and 18 deletions.
Binary file modified docs/source/.DS_Store
Binary file not shown.
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,10 @@
#
# The short X.Y version.
#version = u'0.0.1'
version = u'1.0.3'
version = u'1.0.4'
# The full version, including alpha/beta/rc tags.
#release = u'0.0.1'
release = u'1.0.3'
release = u'1.0.4'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/download.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Download and installation
=========================

The latest version of GLINT is available on github `here`_. Please read the associated `README file`_ on github for details about downloading and installing GLINT.
The latest version of GLINT and the tutorial files are available on github `here`_. Please read the associated `README file`_ on github for details about downloading and installing GLINT.


.. _here: https://github.com/cozygene/glint/releases/
Expand Down
3 changes: 2 additions & 1 deletion docs/source/ewas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,12 +155,13 @@ Performs EWAS on the data using linear mixed model (LMM). This is an implementat

The output file named *results.glint.lmm.txt** includes a list of the sites, sorted by their association p-value. The output file includes the following columns: ID (CpG identifiers), chromosome (chromosome number of the site), MAPINFO (position of the site in the genome), p-value, q-value, intercept , V1 (coefficient of the first covariate),..., Vn (coefficient of the last covaraite, beta (the coefficient of the site under test), statistic (the test statistic), sigma-e (an estimate of sigma_e), sigma-g (an estimate of sigma_g), UCSC_RefGene_Name (name of the gene that is closest to this site), Relation_to_UCSC_CpG_Island (category)

.. _--kinship:

**--kinship**

The kinship matrix for modelling the inter-individual similarity in the data that is required for the LMM. GLINT allows two options:

- User-supplied kinship - users can suplly a text file with samples by samples kinship matrix (with no row or column headers).
- User-supplied kinship - users can suplly a text file with samples by samples kinship matrix (tab-delimited and with no row or column headers).
- *refactor* - the ReFACTor algorithm can be used for constructing the kinship matrix. If this option is used then ReFACTor is executed for selecting the top informative sites in the data. The kinship matrix is then constructed by calculatign the empirical covariance matrix of the samples based on the selected sites.

For example::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/howtocite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ How to cite GLINT?

If you use GLINT in any published work, please cite the paper describing it:

Rahmani, Elior, Reut Yedidim, Liat Shenhav, Regev Schweiger, Omer Weissbrod, Noah Zaitlen, and Eran Halperin. "GLINT: a user-friendly toolset for the analysis of high-throughput DNA-methylation array data." Bioinformatics, *in press*.
Rahmani, Elior, Reut Yedidim, Liat Shenhav, Regev Schweiger, Omer Weissbrod, Noah Zaitlen, and Eran Halperin. "GLINT: a user-friendly toolset for the analysis of high-throughput DNA-methylation array data." Bioinformatics 2017; 33 (12): 1870-1872.


In addition:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ will load the methylation data matrix in the *datafile.txt* file. See the tutori

**--covarfile**

Path to a file containing samples by covariates matrix. The first row may be a row of headers - the names of the covariates, and the first column should include sample identifiers. The first row, if provided headers, may include the field "ID" at the beginning. If a row of headers is not provided then GLINT will automatically generate a name for each covariate. The file can be either tab-delimited, comma-delimited or space-delimited. The matrix entries are not allowed to include quotes.
Path to a file containing samples by covariates matrix. The first row may be a row of headers - the names of the covariates, and the first column should include sample identifiers. The first row, if provided headers, may include the field "ID" at the beginning. If a row of headers is not provided then GLINT will automatically generate a name for each covariate. The file can be either tab-delimited, comma-delimited or space-delimited. The matrix entries are not allowed to include quotes, and covariates must be numeric (i.e. categorial covariates should be encoded numerically).

For example, adding the following to your GLINT command::

Expand All @@ -72,7 +72,7 @@ will provide the covariates matrix in the *covariates.txt* file. See the tutoria

**--phenofile**

Path to a file containing samples by phenotypes matrix. The first row may be a row of headers - the names of the phenotypes, and the first column should include sample identifiers. The first row, if provided headers, may include the field "ID" at the beginning. If a row of headers is not provided then GLINT will automatically generate a name for each phenotype. The file can be either tab-delimited, comma-delimited or space-delimited. The matrix entries are not allowed to include quotes.
Path to a file containing samples by phenotypes matrix. The first row may be a row of headers - the names of the phenotypes, and the first column should include sample identifiers. The first row, if provided headers, may include the field "ID" at the beginning. If a row of headers is not provided then GLINT will automatically generate a name for each phenotype. The file can be either tab-delimited, comma-delimited or space-delimited. The matrix entries are not allowed to include quotes, and phenotypes must be numeric (i.e. categorial phenotypes should be encoded numerically).

For example, adding the following to your GLINT command::

Expand Down
1 change: 1 addition & 0 deletions docs/source/plots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Plots

This argument allows to plot several types of plots, as desribed bellow.

.. note:: For using `--plot`_ when working on a remote server (e.g., via SSH), make sure X11-forwarding is enabled.

.. note:: The example commands described bellow assume that the user generated `GLINT files`_ with covariates file and phenotypes file.

Expand Down
14 changes: 10 additions & 4 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,20 @@ Quick start tutorial


This tutorial will quickly walk you through the basic functionality of GLINT.
For this tutorial we use subset of a public dataset from GEO (accession ID GSE77716_; the dataset is described in details in Rahmani et al. [1]_). In order to run this tutorial you will need to `download GLINT`_ and to get the tutorial files from here_. The tutorial files include:
For this tutorial we use subset of a public dataset from GEO (accession ID GSE77716_; the dataset is described in details in Rahmani et al. [1]_). In order to run this tutorial you will need to download GLINT and get the tutorial files (see here_).

The tutorial files include:

- *datafile.txt* - 50,000 sites by 96 samples matrix of methylation levels
- *covariates.txt* - covariates matrix, each column corresponds to one covariate
- *covariates.txt* - covariates matrix, each column corresponds to one (numeric) covariate
- *phenotypes.txt* - phenotypes matrix, each column corresponds to one phenotype

.. Files and figures generated by this tutorial can be found under the 'results' directory in the tutorial files directory.
Bellow is a set of simple commands, together composing a full pipeline of EWAS analysis (after raw data normalization). The commands bellow assume the user downloaded GLINT (see `Download and installation`_) and added the tutorial files into the software's root directory. For more details about any specific argument see the documentation.

.. note:: Avoid execution errors by using the latest release of GLINT and the tutorial files, and make sure to follow the installation instructions in the README_ file.

1. **Create GLINT files**

First, we start by saving a binary version of our data: the methylation data file, covariates and phenotypes of interest. This step will allow a substantial speed-up in all following commands. Navigate to the GLINT directory and run the following:
Expand All @@ -40,6 +44,8 @@ This command generates a figure titled *pcs_plot.png*, showing scatter plots of
:width: 60%
:align: center

.. note:: For using `--plot`_ when working on a remote server (e.g., via SSH), make sure X11-forwarding is enabled.

3. **Remove outliers**

For this tutorial we consider samples with values more extreme than 4 sandard deviations (SDs) in their first two PCs as outliers. Following that definition, we currently have 2 outliers in the data, as reflected in the top panel of the *pcs_plot.png* figure.
Expand Down Expand Up @@ -122,9 +128,9 @@ Finally, in our example we found a single significant association in chromosome
|

.. _here: https://github.com/cozygene/glint/releases/download/1.0.2/Tutorial_files.zip
.. _here: download.html

.. _download GLINT: download.html
.. _README: https://github.com/cozygene/glint

.. _Download and installation: download.html

Expand Down
8 changes: 8 additions & 0 deletions docs/source/versions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ Versions
The latest version of GLINT is available on github `here`_.
Below are details about each of the versions released so far.

GLINT 1.0.4
^^^^^^^^^^^
*June 25, 2017*

* Fixed a bug with the `--kinship`_ argument when using `--lmm`_

GLINT 1.0.3
^^^^^^^^^^^
*February 10, 2017*
Expand Down Expand Up @@ -43,6 +49,8 @@ The first release of GLINT!

.. _here: https://github.com/cozygene/glint/releases/

.. _--lmm: ewas.html#lmm
.. _--kinship: ewas.html#kinship
.. _--refactor: tissueheterogeneity.html#refactor
.. _--houseman: tissueheterogeneity.html#houseman
.. _--rmpoly: datamanagement.html#rmpoly
Expand Down
2 changes: 1 addition & 1 deletion parsers/lmm_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def run(self, args, meth_data, pheno, output_perfix, covars = None):

if type(args.kinship) == file: #kinship is provided via file
logging.info("loading kinship from %s" % args.kinship.name)
kinship = common.loadtxt(args.kinship)
kinship = common.loadtxt(args.kinship, dtype = float)

elif args.kinship == 'refactor': # kinship and data to test are the same
# todo if --lmm provided with --refactor there is no need to run refactor twice in order to find ranked sites.
Expand Down
Binary file modified tests/ewas/.DS_Store
Binary file not shown.
Binary file modified tests/methylation_data/.DS_Store
Binary file not shown.
Binary file modified tests/methylation_data/files/.DS_Store
Binary file not shown.
Binary file modified tests/refactor/.DS_Store
Binary file not shown.
11 changes: 5 additions & 6 deletions tests/tutorial/test_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
import os

CUR_DIR = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
TUTORIAL_FILES_DIR = "docs"
TUTORIAL_FILES_DIR = "tutorial"

TUTORIAL_FILES = ["docs/datafile.txt", "docs/covariates.txt", "docs/phenotypes.txt"]
TUTORIAL_FILES = ["tutorial/datafile.txt", "tutorial/covariates.txt", "tutorial/phenotypes.txt"]

def check_tutorial_files_exist():
for path in TUTORIAL_FILES:
Expand All @@ -14,17 +14,17 @@ def check_tutorial_files_exist():

class TutorialTester():
TUTORIAL_CMDS = [
"python glint.py --datafile docs/datafile.txt --covarfile docs/covariates.txt --phenofile docs/phenotypes.txt --gsave",
"python glint.py --datafile tutorial/datafile.txt --covarfile tutorial/covariates.txt --phenofile tutorial/phenotypes.txt --gsave",
"python glint.py --datafile datafile.glint --plot --plotpcs --numpcs 2 --out pcs_plot",
"python glint.py --datafile datafile.glint --maxpcstd 1 4 --gsave --out data_cleaned",
"python glint.py --datafile data_cleaned.glint --refactor --k 6 --covar age gender chip1 chip2 chip3 chip4 chip5 chip6 chip7 chip8 --gsave --out data_cleaned_v2",
"python glint.py --datafile data_cleaned_v2.glint --epi --covar rc1 rc2 rc3 rc4 rc5 rc6 --gsave --out data_final",
"python glint.py --datafile data_final.glint --ewas --linreg --pheno y1 --covar age gender rc1 rc2 rc3 rc4 rc5 rc6 epi1 --stdth 0.01 --rmxy --rmns --rmpoly",
"python glint.py --plot --qqplot --manhattan --results results.glint.linreg.txt",
"python glint.py --datafile data_final.glint --ewas --linreg --pheno y1 --covar age gender epi1 --stdth 0.01 --rmxy --rmns --rmpoly --plot --qqplot --manhattan --out unadjusted",
"python glint.py --datafile docs/datafile.txt --gsave --out newdata",
"python glint.py --datafile tutorial/datafile.txt --gsave --out newdata",
"python glint.py --datafile datafile.glint --txtsave",
"python replace_missing_values.py --datafile docs/datafile.txt --chr NA --maxs 0.03 --maxi 0.03",
"python replace_missing_values.py --datafile tutorial/datafile.txt --chr NA --maxs 0.03 --maxi 0.03",
]
def __init__(self):
logging.info("Testing Started on TutorialTester")
Expand All @@ -40,4 +40,3 @@ def run_tutorial_commands(self):
logging.error("error in tutorial command: '%s'" % cmd)
exit(2)
print "PASS"

Binary file modified tests/utils/.DS_Store
Binary file not shown.

0 comments on commit 02af6e3

Please sign in to comment.