Skip to content

yhoogstrate/fastafs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

890 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 FastaFS

C++ Standard License: GPL-2.0 DOI

FastaFS lets you mount compressed FASTA archives as a virtual filesystem β€” enabling instant random access without preprocessing, indexing, or duplication.


❓ Why FastaFS?

Working with large FASTA files is inefficient and error-prone:

  • Requires auxiliary files (.fai, .dict)
  • Random access needs preprocessing or indexing
  • Tools expect flat files, not compressed archives
  • Storage is duplicated across pipelines
  • Data and metadata can get out of sync

FastaFS solves this by turning compressed FASTA archives into a mountable filesystem.


πŸ”₯ Key Features

  • ⚑ Near-native performance – optimized C++ backend with minimal overhead
  • 🧠 No on-demand preprocessing required – skip indexing and loading into memory
  • πŸ“‚ Works with existing tools – use grep, awk, samtools, etc.
  • πŸ’Ύ Efficient storage – no duplicate FASTA files or temporary extraction
  • πŸ”Œ Mount as a filesystem – interact like regular files
  • πŸ”„ Preserves compatibility – fully compatible with existing FASTA-based workflows and tooling
  • 🎯 Selective decompression – only access the regions you need

πŸš€ Quick Start

# Clone
git clone https://github.com/yhoogstrate/fastafs.git
cd fastafs

# Build
./build-release.sh
make check

# Cache + mount
./fastafs cache reference ./reference.fa
./fastafs mount reference /mnt/genome

# Use like normal files
ls /mnt/genome
head /mnt/genome/chr1.fa

🧠 How it works

FastaFS introduces a virtual filesystem layer using FUSE.

When mounted:

  • FASTA and metadata files are generated on-the-fly
  • Only requested regions are decompressed
  • .fa, .fai, .dict, and .2bit stay perfectly in sync

➑️ No temporary files. No duplication. No indexing overhead.


πŸ§ͺ Use Cases

  • Large-scale genomics pipelines
  • HPC environments with limited I/O bandwidth
  • Streaming access to reference genomes
  • Toolchains requiring standard FASTA input
  • Reproducible workflows

πŸ“„ File Format Specification

https://github.com/yhoogstrate/fastafs/blob/master/doc/FASTAFS-FORMAT-SPECIFICATION.md


πŸ”— Links

https://bio.tools/fastafs https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md


🧰 Installation

Dependencies

  • libboost (unit testing)
  • libopenssl / libssl
  • libfuse
  • zlib / libzstd
  • C++ compiler (C++14+)
  • cmake or meson + ninja

Debian / Ubuntu

sudo apt install git build-essential cmake libboost-dev libssl-dev \
libboost-test-dev libboost-system-dev libboost-filesystem-dev \
zlib1g-dev libzstd-dev libfuse-dev

git clone https://github.com/yhoogstrate/fastafs.git
cd fastafs

RHEL / CentOS / Fedora

sudo yum install git cmake gcc-c++ boost-devel openssl-devel \
libzstd-devel zlib-devel fuse-devel

git clone https://github.com/yhoogstrate/fastafs.git
cd fastafs

Compile (recommended)

cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/usr/local .
make -j $(nproc)
sudo make install

Without root:

cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=~/.local .
make -j $(nproc)
make install

βš™οΈ Usage

Add FASTA to cache

fastafs cache test ./test.fa

Or from 2bit:

fastafs cache test ./test.2bit

List cached datasets

fastafs list

Mount archive

fastafs mount hg19 /mnt/fastafs/hg19
ls /mnt/fastafs/hg19

Inspect sequences

fastafs info

Running mounts

fastafs ps

Mount via fstab

mount.fastafs#/path/to/file.fastafs /mnt/fastafs fuse auto,allow_other 0 0

πŸ“š Citation

If you use FastaFS in your research, please cite:

Hoogstrate, Y., Jenster, G.W. & van de Werken, H.J.G.
FASTAFS: file system virtualisation of random access compressed FASTA files.
BMC Bioinformatics 22, 535 (2021).
https://doi.org/10.1186/s12859-021-04455-3


🀝 Contributing

Contributions are welcome!

  • Open an issue
  • Submit a pull request

Format code with:

make tidy

πŸ’‘ Final note

FastaFS does not replace FASTA or TwoBit β€” it enhances them by making them easier to use, more efficient, and seamlessly integrated into existing workflows.

About

toolkit for file system virtualisation of random access compressed FASTA, FAI, DICT & TWOBIT files

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors