Mimick

Linked-read sequence simulator

Originally known as XENIA from the VISOR project, Mimick is a simulator for linked-read FASTQ data. Mimick allows you to simulate an arbitrary number of haplotypes, set overall coverage, molecule coverage, and choose what kind of linked reads you want.

Supported Linked-Read Types

10X
Haplotagging
stLFR
TELLseq

Simulation parameters

output FASTQ format
overall coverage depth
average molecule length
molecule coverage / reads per molecule
molecules per barcode (barcode convolution)
proportion of singletons (unlinked barcodes)
standard Illumina read characteristics e.g. read length, insert size, etc.

Standout Features

Other than the fun name and logo, Mimick is an improvement over existing linked-read simulators in multiple ways:

It's the only simulator (we are aware of) that isn't configured for discontinued-in-2019 10X linked-read chemistry and is instead generalized for existing options, both in terms of data formats and the simulation process itself.
Circular DNA support. Yay prokaryotes!
Mimick provides more parameters to tune your simulations for realistic linked-read library simulation in the form of singletons and molecule coverage. These characteristics are very important regarding the performance of a linked-read library.
As of version 2.0, Mimick uses a barcode-first simulation approach, which allows barcodes to be shared across chromosomes/contigs and haplotypes. This form of barcode sharing is a common phenomenon in real linked-read libraries, but a characteristic existing simulators don't capture (e.g. XENIA only allowed barcode sharing within a chromosome within a haplotype). The documentation explains this in better detail.
As of version 3.0 (upcoming), it supports multi-sample simulation by way of one FASTA and one VCF as input. Sample haplotypes are made by applying SNP and indel variants from VCF to the contigs in the FASTA.
It's fast. The Julia version (v3+) is a signficant speedup for single-sample simulation and the multi-sample simulation is parallelized across samples.

Authors

@pdimens (Mimick)

@davidebolo1993 (VISOR)

Note

Why name it "mimick"? Well, this software mimics linked-read data, I have an affinity for naming software after fictional monsters and "mimick" (with a "k") is the old-English spelling of the word, leaving mimic available for some other bioinformatician to use for a less farcical reason. Despite the lore of mimics being deadly traps, this software is anything but, I promise.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
MimickLinkedReads.jl		MimickLinkedReads.jl
docs		docs
mimick		mimick
resources		resources
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mimick

Supported Linked-Read Types

Simulation parameters

Standout Features

Authors

About

Uh oh!

Releases 14

Uh oh!

Languages

License

pdimens/mimick

Folders and files

Latest commit

History

Repository files navigation

Mimick

Supported Linked-Read Types

Simulation parameters

Standout Features

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Uh oh!

Languages