Skip to content

Commit 16f6778

Browse files
committed
Updated readme for 1.0 release
1 parent 0b1c9d5 commit 16f6778

File tree

1 file changed

+54
-52
lines changed

1 file changed

+54
-52
lines changed

README.md

Lines changed: 54 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TimeSweeper
22

3-
Timesweeper is a python package for detecting positive selective sweeps from time-series genomic sampling using convolutional neural networks.
3+
Timesweeper is a package for detecting positive selective sweeps from time-series genomic sampling using convolutional neural networks.
44

55
Experiments and figures for the Timesweeper manuscript can be found here: https://github.com/SchriderLab/timesweeper-experiments
66

@@ -33,13 +33,13 @@ Timesweeper is built as a series of modules that are chained together to build a
3333
1. Either based on the `example_demo_model.slim` example
3434
2. Or by using stdpopsim to generate a SLiM script
3535
2. Simulate demographic model with time-series sampling
36-
1. `simulate_custom` if using custom SLiM script
37-
2. `simulate_stdpopsim` if using a SLiM script output by stdpopsim
36+
1. `timesweeper sim_custom` if using custom SLiM script
37+
2. `sim_stdpopsim` if using a SLiM script output by stdpopsim
3838
3. Note: If available, we suggest using a job submission platform such as SLURM to parallelize simulations. This is the most resource and time-intensive part of the module by far.
39-
3. Preprocess simulated vcfs by merging with `process_vcfs.sh`
40-
4. Create features for the neural network with `make_training_features.py`
41-
5. Train networks with `nets.py`
42-
6. Run `timesweeper.py` on VCF of interest using trained models and input data
39+
3. Preprocess simulated vcfs by merging with `process`
40+
4. Create features for the neural network with `condense`
41+
5. Train networks with `train`
42+
6. Run `detect` on VCF of interest using trained models and input data
4343

4444
---
4545

@@ -54,13 +54,15 @@ cd timeSeriesSweeps
5454
make
5555
```
5656

57-
Otherwise you can install dependencies with:
57+
Otherwise you can install dependencies the long way with:
5858

5959
```{bash}
6060
git clone [email protected]:SchriderLab/timeSeriesSweeps.git
6161
6262
conda env create -f blinx.yml
6363
64+
conda activate blinx
65+
6466
pip install .
6567
```
6668

@@ -87,7 +89,7 @@ For any given experiment run you will need a YAML configuration file (see `examp
8789
- **Mutation Rate** (`mut rate`) - just overwrites the stdpopsim mutation rate in case you'd like to fiddle with it.
8890
- **Generation Time** (`gen time`) - allows conversions between generations and continuous time.
8991

90-
Example config file:
92+
Example config file for a stdpopsim simulation run:
9193

9294
```{yaml}
9395
#General
@@ -122,8 +124,8 @@ A flexible wrapper for a SLiM script that assumes you have a demographic model a
122124
- `dumpFile`: similarly to outFile this is where the intermediate simulation state is saved to in case of mutation loss or other problems with a replicate.
123125

124126
```
125-
$ python simulate_custom.py -h
126-
usage: simulate_custom.py [-h] [--threads THREADS]
127+
$ timesweeper sim_custom -h
128+
usage: timesweeper sim_custom [-h] [--threads THREADS]
127129
[--rep-range REP_RANGE REP_RANGE]
128130
{yaml,cli} ...
129131
@@ -140,8 +142,8 @@ optional arguments:
140142
be simulated for reps. This is to allow for easy SLURM
141143
parallel simulations.
142144
143-
$ python simulate_custom.py cli -h
144-
usage: simulate_custom.py cli [-h] [-w WORK_DIR] -i SLIM_FILE
145+
$ timesweeper sim_custom cli -h
146+
usage: timesweeper sim_custom cli [-h] [-w WORK_DIR] -i SLIM_FILE
145147
[--slim-path SLIM_PATH] [--reps REPS]
146148
147149
optional arguments:
@@ -157,8 +159,8 @@ optional arguments:
157159
Path to SLiM executable.
158160
--reps REPS Number of replicate simulations to run if not using rep-range.
159161
160-
python simulate_custom.py yaml -h
161-
usage: simulate_custom.py yaml [-h] YAML_CONFIG
162+
timesweeper sim_custom yaml -h
163+
usage: timesweeper sim_custom yaml [-h] YAML_CONFIG
162164
163165
positional arguments:
164166
YAML_CONFIG YAML config file with all cli options defined.
@@ -172,8 +174,8 @@ optional arguments:
172174

173175
For use with SLiM scripts that have been generated using stdpopsim's `--slim-script` option to output the model. This allows for out of the box demographic models downloaded straight from the catalog stdpopsim adds to regularly. Some information needs to be gotten from the model definition so that the wrapper knows which population to sample from, how to scale values if rescaling the simulation, and more. These are described in detail both in the help message of the module and in the above doc section "Configs required for both types of simulation".
174176

175-
```$ python simulate_stdpopsim.py -h
176-
usage: simulate_stdpopsim.py [-h] [-v] [--threads THREADS]
177+
```$ timesweeper sim_stdpopsim -h
178+
usage: timesweeper sim_stdpopsim [-h] [-v] [--threads THREADS]
177179
[--rep-range REP_RANGE REP_RANGE]
178180
{yaml,cli} ...
179181
@@ -191,8 +193,8 @@ optional arguments:
191193
be simulated for reps. This is to allow for easy SLURM
192194
parallel simulations.
193195
194-
python simulate_stdpopsim.py cli -h
195-
usage: simulate_stdpopsim.py cli [-h] -i SLIM_FILE --reps REPS [--pop POP]
196+
timesweeper sim_stdpopsim cli -h
197+
usage: timesweeper sim_stdpopsim cli [-h] -i SLIM_FILE --reps REPS [--pop POP]
196198
--sample_sizes SAMPLE_SIZES
197199
[SAMPLE_SIZES ...] --years-sampled
198200
YEARS_SAMPLED [YEARS_SAMPLED ...]
@@ -235,8 +237,8 @@ optional arguments:
235237
--slim-path SLIM_PATH
236238
Path to SLiM executable.
237239
238-
$ python simulate_stdpopsim.py yaml -h
239-
usage: simulate_stdpopsim.py yaml [-h] YAML CONFIG
240+
$ timesweeper sim_stdpopsim yaml -h
241+
usage: timesweeper sim_stdpopsim yaml [-h] YAML CONFIG
240242
241243
positional arguments:
242244
YAML CONFIG YAML config file with all cli options defined.
@@ -251,8 +253,8 @@ This module splits the multivcf files (which are just multiple concatenated VCF
251253

252254

253255
```
254-
$ python process_vcfs.py -h
255-
usage: process_vcfs.py [-h] [--vcf-header VCF_HEADER] [--threads THREADS]
256+
$ timesweeper process -h
257+
usage: timesweeper process [-h] [--vcf-header VCF_HEADER] [--threads THREADS]
256258
{yaml,cli} ...
257259
258260
Splits and re-merges VCF files to prepare for fast feature creation.
@@ -267,8 +269,8 @@ optional arguments:
267269
new files.
268270
--threads THREADS Number of processes to parallelize across.
269271
270-
$ python process_vcfs.py cli -h
271-
usage: process_vcfs.py cli [-h] [-w WORK_DIR] --sample_sizes SAMPLE_SIZES
272+
$ timesweeper process cli -h
273+
usage: timesweeper process cli [-h] [-w WORK_DIR] --sample_sizes SAMPLE_SIZES
272274
[SAMPLE_SIZES ...]
273275
274276
optional arguments:
@@ -283,8 +285,8 @@ optional arguments:
283285
sample chroms from slim. Must match the number of
284286
entries in the -y flag.
285287
286-
$ python process_vcfs.py yaml -h
287-
usage: process_vcfs.py yaml [-h] YAML CONFIG
288+
$ timesweeper process yaml -h
289+
usage: timesweeper process yaml [-h] YAML CONFIG
288290
289291
positional arguments:
290292
YAML CONFIG YAML config file with all cli options defined.
@@ -295,18 +297,18 @@ optional arguments:
295297

296298
### Make Training Data (`condense`)
297299

298-
VCFs merged using `process_vcfs.py` are read in as allele frequencies using scikit-allel, and depending on the scenario (neut/hard/soft) the central or locus under selection is pulled out and aggregated for all replicates. This labeled ground-truth data from simulations is then saved as a dictionary in a pickle file for easy access and low disk usage.
300+
VCFs merged using `timesweeper process` are read in as allele frequencies using scikit-allel, and depending on the scenario (neut/hard/soft) the central or locus under selection is pulled out and aggregated for all replicates. This labeled ground-truth data from simulations is then saved as a dictionary in a pickle file for easy access and low disk usage.
299301

300302
This module also allows for adding missingness to the training data in the case of missingness in the real data Timesweeper is going to be used on. To do this add the `-m <val>` flag where `val` is in [0,1] and is used as the parameter of a binomial draw for each allele per timestep to set as present/missing. We show in the manuscript that some missingness is viable (e.g. `val=0.2`), however high missingness (e.g. `val=0.5`) will result in terrible performance and should be avoided. Optimally this value should reflect the missingness present in the real data input to Timesweeper so as to parameterize the network to be better prepared for it.
301303

302304
Note: the process of retrieving known-selection sites is based on the mutation type labels contained in VCF INFO fields output by SLiM. It currently assumes the mutation type where selection is being introduced is identified as "m2", but if you use a custom SLiM model and happen to change mutation type this module should be modified to properly scan for that.
303305

304306
```
305-
$ python make_training_features.py -h
306-
usage: make_training_features.py [-h] [--threads THREADS] [-m MISSINGNESS]
307+
$ timesweeper condense -h
308+
usage: timesweeper condense [-h] [--threads THREADS] [-m MISSINGNESS]
307309
{yaml,cli} ...
308310
309-
Creates training data from simulated merged vcfs after process_vcfs.py has
311+
Creates training data from simulated merged vcfs after timesweeper process has
310312
been run.
311313
312314
positional arguments:
@@ -320,8 +322,8 @@ optional arguments:
320322
parameter of a binomial distribution for randomly
321323
removing known values.
322324
323-
$ python make_training_features.py cli -h
324-
usage: make_training_features.py cli [-h] [-w WORK_DIR] -s SAMP_SIZES
325+
$ timesweeper condense cli -h
326+
usage: timesweeper condense cli [-h] [-w WORK_DIR] -s SAMP_SIZES
325327
[SAMP_SIZES ...]
326328
327329
optional arguments:
@@ -335,8 +337,8 @@ optional arguments:
335337
Used to index VCF data from earliest to latest
336338
sampling points.
337339
338-
$ python make_training_features.py yaml -h
339-
usage: make_training_features.py yaml [-h] YAML CONFIG
340+
$ timesweeper condense yaml -h
341+
usage: timesweeper condense yaml [-h] YAML CONFIG
340342
341343
positional arguments:
342344
YAML CONFIG YAML config file with all cli options defined.
@@ -350,8 +352,8 @@ optional arguments:
350352
Timesweeper's neural network architecture is a shallow 1DCNN implemented in Keras2 with a Tensorflow backend that trains extremely fast on CPUs with very little RAM needed. Assuming all previous steps were run it can be trained and evaluated on hold-out test data with a single line invocation.
351353

352354
```
353-
$ python nets.py -h
354-
usage: nets.py [-h] [-n EXPERIMENT_NAME] {yaml,cli} ...
355+
$ timesweeper train -h
356+
usage: timesweeper train [-h] [-n EXPERIMENT_NAME] {yaml,cli} ...
355357
356358
Handler script for neural network training and prediction for TimeSweeper
357359
Package. Will train two models: one for the series of timepoints generated
@@ -366,8 +368,8 @@ optional arguments:
366368
Identifier for the experiment used to generate the
367369
data. Optional, but helpful in differentiating runs.
368370
369-
$ python nets.py cli -h
370-
usage: nets.py cli [-h] [-w WORK_DIR]
371+
$ timesweeper train cli -h
372+
usage: timesweeper train cli [-h] [-w WORK_DIR]
371373
372374
optional arguments:
373375
-h, --help show this help message and exit
@@ -376,8 +378,8 @@ optional arguments:
376378
Should contain pickled training data from simulated vcfs processed using
377379
process_vcf.py.
378380
379-
$ python nets.py yaml -h
380-
usage: nets.py yaml [-h] YAML CONFIG
381+
$ timesweeper train yaml -h
382+
usage: timesweeper train yaml [-h] YAML CONFIG
381383
382384
positional arguments:
383385
YAML CONFIG YAML config file with all cli options defined.
@@ -406,8 +408,8 @@ Timesweeper will optionally run frequency increment test if the generation time
406408
Timesweeper also has a `--benchmark` flag that will allow for testing accuracy on simulated data if wanted. This will search the input data for the mutation type identifier flags allowing a benchmark of detection accuracy on data that has a ground truth.
407409

408410
```
409-
$ python timesweeper.py -h
410-
usage: timesweeper.py [-h] -i INPUT_VCF [--benchmark] --aft-model AFT_MODEL
411+
$ timesweeper detect -h
412+
usage: timesweeper detect [-h] -i INPUT_VCF [--benchmark] --aft-model AFT_MODEL
411413
{yaml,cli} ...
412414
413415
Module for iterating across windows in a time-series vcf file and predicting
@@ -431,8 +433,8 @@ optional arguments:
431433
Path to Keras2-style saved model to load for aft
432434
prediction.
433435
434-
$ python timesweeper.py cli -h
435-
usage: timesweeper.py cli [-h] -s SAMP_SIZES [SAMP_SIZES ...] [-w WORKING_DIR]
436+
$ timesweeper detect cli -h
437+
usage: timesweeper detect cli [-h] -s SAMP_SIZES [SAMP_SIZES ...] [-w WORKING_DIR]
436438
[--years-sampled YEARS_SAMPLED [YEARS_SAMPLED ...]]
437439
[--gen-time GEN_TIME]
438440
@@ -453,8 +455,8 @@ optional arguments:
453455
Similarly to years_sampled, only used for FIT
454456
calculation and is optional.
455457
456-
$ python timesweeper.py yaml -h
457-
usage: timesweeper.py yaml [-h] YAML CONFIG
458+
$ timesweeper detect yaml -h
459+
usage: timesweeper detect yaml [-h] YAML CONFIG
458460
459461
positional arguments:
460462
YAML CONFIG YAML config file with all cli options defined.
@@ -482,18 +484,18 @@ conda activate blinx
482484
cd timesweeper
483485
484486
#Simulate training data
485-
python simulate_custom.py yaml example_config.yaml
487+
timesweeper sim_custom yaml example_config.yaml
486488
487489
#Process VCFs
488-
python process_vcfs.py yaml example_config.yaml
490+
timesweeper process yaml example_config.yaml
489491
490492
#Assume foo.vcf has a missingness of 0.05 and create pickle file
491-
python make_training_features.py -m 0.05 yaml example_config.yaml
493+
timesweeper condense -m 0.05 yaml example_config.yaml
492494
493495
#Train network
494-
python nets.py -n example_ts_run yaml example_config.yaml
496+
timesweeper train -n example_ts_run yaml example_config.yaml
495497
496498
#Predict on input VCF
497-
python timesweeper.py -i foo.vcf --aft-model ts_experiment/trained_models/example_ts_run_Timesweeper_aft
499+
timesweeper detect -i foo.vcf --aft-model ts_experiment/trained_models/example_ts_run_Timesweeper_aft
498500
```
499501

0 commit comments

Comments
 (0)