You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+54-52Lines changed: 54 additions & 52 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# TimeSweeper
2
2
3
-
Timesweeper is a python package for detecting positive selective sweeps from time-series genomic sampling using convolutional neural networks.
3
+
Timesweeper is a package for detecting positive selective sweeps from time-series genomic sampling using convolutional neural networks.
4
4
5
5
Experiments and figures for the Timesweeper manuscript can be found here: https://github.com/SchriderLab/timesweeper-experiments
6
6
@@ -33,13 +33,13 @@ Timesweeper is built as a series of modules that are chained together to build a
33
33
1. Either based on the `example_demo_model.slim` example
34
34
2. Or by using stdpopsim to generate a SLiM script
35
35
2. Simulate demographic model with time-series sampling
36
-
1.`simulate_custom` if using custom SLiM script
37
-
2.`simulate_stdpopsim` if using a SLiM script output by stdpopsim
36
+
1.`timesweeper sim_custom` if using custom SLiM script
37
+
2.`sim_stdpopsim` if using a SLiM script output by stdpopsim
38
38
3. Note: If available, we suggest using a job submission platform such as SLURM to parallelize simulations. This is the most resource and time-intensive part of the module by far.
39
-
3. Preprocess simulated vcfs by merging with `process_vcfs.sh`
40
-
4. Create features for the neural network with `make_training_features.py`
41
-
5. Train networks with `nets.py`
42
-
6. Run `timesweeper.py` on VCF of interest using trained models and input data
39
+
3. Preprocess simulated vcfs by merging with `process`
40
+
4. Create features for the neural network with `condense`
41
+
5. Train networks with `train`
42
+
6. Run `detect` on VCF of interest using trained models and input data
43
43
44
44
---
45
45
@@ -54,13 +54,15 @@ cd timeSeriesSweeps
54
54
make
55
55
```
56
56
57
-
Otherwise you can install dependencies with:
57
+
Otherwise you can install dependencies the long way with:
@@ -87,7 +89,7 @@ For any given experiment run you will need a YAML configuration file (see `examp
87
89
-**Mutation Rate** (`mut rate`) - just overwrites the stdpopsim mutation rate in case you'd like to fiddle with it.
88
90
-**Generation Time** (`gen time`) - allows conversions between generations and continuous time.
89
91
90
-
Example config file:
92
+
Example config file for a stdpopsim simulation run:
91
93
92
94
```{yaml}
93
95
#General
@@ -122,8 +124,8 @@ A flexible wrapper for a SLiM script that assumes you have a demographic model a
122
124
-`dumpFile`: similarly to outFile this is where the intermediate simulation state is saved to in case of mutation loss or other problems with a replicate.
YAML_CONFIG YAML config file with all cli options defined.
@@ -172,8 +174,8 @@ optional arguments:
172
174
173
175
For use with SLiM scripts that have been generated using stdpopsim's `--slim-script` option to output the model. This allows for out of the box demographic models downloaded straight from the catalog stdpopsim adds to regularly. Some information needs to be gotten from the model definition so that the wrapper knows which population to sample from, how to scale values if rescaling the simulation, and more. These are described in detail both in the help message of the module and in the above doc section "Configs required for both types of simulation".
usage: timesweeper process cli [-h] [-w WORK_DIR] --sample_sizes SAMPLE_SIZES
272
274
[SAMPLE_SIZES ...]
273
275
274
276
optional arguments:
@@ -283,8 +285,8 @@ optional arguments:
283
285
sample chroms from slim. Must match the number of
284
286
entries in the -y flag.
285
287
286
-
$python process_vcfs.py yaml -h
287
-
usage: process_vcfs.py yaml [-h] YAMLCONFIG
288
+
$ timesweeper process yaml -h
289
+
usage: timesweeper process yaml [-h] YAML CONFIG
288
290
289
291
positional arguments:
290
292
YAML CONFIG YAML config file with all cli options defined.
@@ -295,18 +297,18 @@ optional arguments:
295
297
296
298
### Make Training Data (`condense`)
297
299
298
-
VCFs merged using `process_vcfs.py` are read inas allele frequencies using scikit-allel, and depending on the scenario (neut/hard/soft) the central or locus under selection is pulled out and aggregated forall replicates. This labeled ground-truth data from simulations is then saved as a dictionary in a pickle filefor easy access and low disk usage.
300
+
VCFs merged using `timesweeper process` are read in as allele frequencies using scikit-allel, and depending on the scenario (neut/hard/soft) the central or locus under selection is pulled out and aggregated for all replicates. This labeled ground-truth data from simulations is then saved as a dictionary in a pickle file for easy access and low disk usage.
299
301
300
302
This module also allows for adding missingness to the training data in the case of missingness in the real data Timesweeper is going to be used on. To do this add the `-m <val>` flag where `val` is in [0,1] and is used as the parameter of a binomial draw for each allele per timestep to set as present/missing. We show in the manuscript that some missingness is viable (e.g. `val=0.2`), however high missingness (e.g. `val=0.5`) will result in terrible performance and should be avoided. Optimally this value should reflect the missingness present in the real data input to Timesweeper so as to parameterize the network to be better prepared for it.
301
303
302
304
Note: the process of retrieving known-selection sites is based on the mutation type labels contained in VCF INFO fields output by SLiM. It currently assumes the mutation type where selection is being introduced is identified as "m2", but if you use a custom SLiM model and happen to change mutation type this module should be modified to properly scan for that.
YAML CONFIG YAML config file with all cli options defined.
@@ -350,8 +352,8 @@ optional arguments:
350
352
Timesweeper's neural network architecture is a shallow 1DCNN implemented in Keras2 with a Tensorflow backend that trains extremely fast on CPUs with very little RAM needed. Assuming all previous steps were run it can be trained and evaluated on hold-out test data with a single line invocation.
Handler script for neural network training and prediction for TimeSweeper
357
359
Package. Will train two models: one for the series of timepoints generated
@@ -366,8 +368,8 @@ optional arguments:
366
368
Identifier for the experiment used to generate the
367
369
data. Optional, but helpful in differentiating runs.
368
370
369
-
$python nets.py cli -h
370
-
usage: nets.py cli [-h] [-w WORK_DIR]
371
+
$ timesweeper train cli -h
372
+
usage: timesweeper train cli [-h] [-w WORK_DIR]
371
373
372
374
optional arguments:
373
375
-h, --help show this help message and exit
@@ -376,8 +378,8 @@ optional arguments:
376
378
Should contain pickled training data from simulated vcfs processed using
377
379
process_vcf.py.
378
380
379
-
$python nets.py yaml -h
380
-
usage: nets.py yaml [-h] YAMLCONFIG
381
+
$ timesweeper train yaml -h
382
+
usage: timesweeper train yaml [-h] YAML CONFIG
381
383
382
384
positional arguments:
383
385
YAML CONFIG YAML config file with all cli options defined.
@@ -406,8 +408,8 @@ Timesweeper will optionally run frequency increment test if the generation time
406
408
Timesweeper also has a `--benchmark` flag that will allow for testing accuracy on simulated data if wanted. This will search the input data for the mutation type identifier flags allowing a benchmark of detection accuracy on data that has a ground truth.
0 commit comments