- SeqKit v0.13.0
seqkit
: fix a rare FASTA/Q parser bug. #127seqkit seq
: output sequence or quality in single line when-s/--seq
or-q/--qual
is on. #132- New features and improvements by @bsipos. #130
- Added
scat
, a subcommand for real-time robust concatenation of fastx files. - Rewrote the parser behind the
sana
subcommand, now it supports robust parsing of fasta file as well. - Added a "toolbox" feature to the
bam
subcommand (-T
), which is a collection of filters acting on streams of BAM records configured through a YAML string (see the docs for more). - Added the
SEQKIT_THREADS
environmental variable to override the default number of threads.
- SeqKit v0.12.1
seqkit bam
: add colorised and pretty printed output, by @bsipos. #110seqkit locate/grep
: fix bug of-m
, when query contains letters not in subject sequences. #124seqkit split2
: new flag-l/--by-length
for splitting into chunks of N bases.seqkit fx2tab
:seqkit seq
: new flag-k/--color
: colorize sequences.
- SeqKit v0.12.0
seqkit
:- fix checking input file existence.
- new global flag
--infile-list
for long list of input files, if given, they are appended to files from cli arguments.
seqkit faidx
: supporting "truncated" (no ending newline charactor) file.seqkit seq
:- do not force switching on
-g
when using-m/-M
. - show recommendation if flag
-t/--seq-type
is not DNA/RNA when computing complement sequence. #103
- do not force switching on
seqkit translate
: supporting multiple frames. #96seqkit grep/locate
:- add detection and warning for space existing in search pattern/sequence.
- speed improvement (2X) for
-m/--max-mismatch
. shenwei356/bwt/issues/3
seqkit locate
:- new flag
-M/--hide-matched
for hiding matched sequences. #98 - new flag
-r/--use-regexp
for explicitly using regular expression, so improve speed of defaultindex
operation. And you have to switch this on if using regexp now. #101 - new flag
-F/--use-fmi
for improving search speed for lots of sequence patterns.
- new flag
seqkit rename
: making IDs unique across multiple files, and can write into multiple files. #100seqkit sample
: fix stdin checking for flag-2
. #102.seqkit rename/split/split2
: fix detection of existed outdir.split split
: fix bug ofseqkit split -i -2
and parallizing it.seqkit version
: checking update is optional (-u
).
- SeqKit v0.11.0.
seqkit
: fix hanging when reading from truncated gzip file.- new commands:
seqkit amplicon
: retrieve amplicon (or specific region around it) via primer(s).
- new commands by @bsipos:
seqkit watch
: monitoring and online histograms of sequence features.seqkit sana
: sanitize broken single line fastq files.seqkit fish
: look for short sequences in larger sequences using local alignment.seqkit bam
: monitoring and online histograms of BAM record features.
seqkit grep/locate
: reduce memory occupation when using flag-m/--max-mismatch
.seqkit seq
: fix panic of computing complement sequence for long sequences containing illegal letters without flag-v
on. #84
- SeqKit v0.10.2
seqkit
: fix bug of parsing sequence ID delimited by tab (\t
). #78seqkit grep
: better logic of--delete-matched
.seqkit common/rmdup/split
: use xxhash to replace MD5 when comparing with sequence, discard flag-m/--md5
.seqkit stats
: new flag-b/--basename
for outputting basename instead of full path.
- SeqKit v0.10.1
seqkit fx2tab
: new option-q/--avg-qual
for outputting average read quality. #60seqkit grep/locate
: fix support ofX
when using-d/--degenerate
. #61seqkit translate
:- new flag
-M/--init-codon-as-M
to translate initial codon at beginning to 'M'. #62 - translates
---
to-
for aligned DNA/RNA, flag-X
needed. #63 - supports codons containing ambiguous bases, e.g.,
GGN->G
,ATH->I
. #64 - new flag
-l/--list-transl-table
to show details of translate table N - new flag
-l/--list-transl-table-with-amb-codons
to show details of translate table N (including ambigugous codons)
- new flag
seqkit split/split2
, fix bug of ignoring-O
when reading from stdin.
- SeqKit v0.10.0
seqkit
: report error when input is directory.- new command
seqkit mutate
: edit sequence (point mutation, insertion, deletion).
- SeqKit v0.9.3
seqkit stats
: fix panic for empty file. #57seqkit translate
: add flag-x/--allow-unknown-codon
to translate unknown codon toX
.
- SeqKit v0.9.2
seqkit
: stricter checking for value of global flag-t/--seq-type
.seqkit sliding
: fix bug for flag-g/--greedy
. #54seqkit translate
: fix bug for frame < 0. #55seqkit seq
: add TAB to default blank characters (flag-G/--gap-letters
), and fix filter result when using flag-g/--remove-gaps
along with-m/--min-len
or-M/--max-len
- SeqKit v0.9.1
- SeqKit v0.9.0
seqkit
: better handle of empty file, no error message shown. #36- new subcommand
seqkit split2
: split sequences into files by size/parts (FASTA, PE/SE FASTQ). #35 - new subcommand
seqkit translate
: translate DNA/RNA to protein sequence. #28 seqkit sort
: fix bug when using-2 -i
, and add support for sorting in natural order. #39seqkit grep
andseqkit locate
: add experimental support of mismatch when searching subsequences. #14seqkit stats
: add stats of Q20 and Q30 for FASTQ. #45
- SeqKit v0.8.1
seqkit
: do not callpigz
orgzip
for decompressing gzipped file any more. But you can still utilizepigz
orgzip
bypigz -d -c seqs.fq.gz | seqkit xxx
.seqkit subseq
: fix bug of missing quality when using--gtf
or--bed
seqkit stats
: parallelize counting files, it's much faster for lots of small files, especially for files on SSD
- SeqKit v0.8.0
seqkit
, stricter FASTA/Q format requirement, i.e., must starting with>
or@
.seqkit
, fix output format for FASTQ files containing zero-length records, yes this happens.seqkit
, add amino acid codeO
(pyrrolysine) andU
(selenocysteine).seqkit replace
, add flag--nr-width
to fill leading 0s for{nr}
, useful for preparing sequence submission (">strain_00001 XX", ">strain_00002 XX").seqkit subseq
, require BED file to be tab-delimited.
- SeqKit v0.7.2
seqkit tab2fx
: fix a concurrency bug that occurs in low proprobability when only 1-column data provided.seqkit stats
: add quartiles of sequence lengthseqkit faidx
: add support for retrieving subsequence using seq ID and region, which is similar with "samtools faidx" but has some extra features
- SeqKit v0.7.1
seqkit convert
: fix bug of read quality containing only 3 or less values. shenwei356/bio/issues/3seqkit stats
: add option-T/--tabular
to output in machine-friendly tabular format. #23seqkit common
: increase speed and decrease memory occupation, and add some notes.- fix some typos. #22
- suggestion: please install pigz to gain better parsing performance for gzipped data.
- SeqKit v0.7.0
- add new command
convert
for converting FASTQ quality encoding between Sanger, Solexa and Illumina. Thanks suggestion from @cviner ( #18). usage & example. - add new command
range
for printing FASTA/Q records in a range (start:end). #19. usage & example. - add new command
concate
for concatenating sequences with same ID from multiple files. usage & example.
- add new command
- SeqKit v0.6.0
- SeqKit v0.5.5
- Increasing speed of reading
.gz
file by utilizinggzip
(1.3X), it would be much faster if you installedpigz
(2X). - Fixing colorful output in Windows
seqkit locate
: add flag--gtf
and--bed
to output GTF/BED6 format, so the result can be used inseqkit subseq
.seqkit subseq
: fix bug of--bed
, add checking coordinate.
- Increasing speed of reading
- SeqKit v0.5.4
seqkit subseq --gtf
, add flag--gtf-tag
to set tag that's outputted as sequence comment- fix
seqkit split
andseqkit sample
: forget not to wrap sequence and quality in output for FASTQ format - compile with go1.8.1
- SeqKit v0.5.3
seqkit grep
: fix bug when usingseqkit grep -r -f patternfile
: all records will be retrived due to failing to discarding the blank pattern (""
). #11
- SeqKit v0.5.2
seqkit stats -a
andseqkit seq -g -G
: change default gap letters from '- ' to '- .'seqkit subseq
: fix bug of range overflow when using-d/--down-stream
or-u/--up-stream
for retieving subseq using BED (--beb
) or GTF (--gtf
) file.seqkit locate
: add flag-G/--non-greedy
, non-greedy mode, faster but may miss motifs overlaping with others.
- SeqKit v0.5.1
seqkit restart
: fix bug of flag parsing
- SeqKit v0.5.0
- new command
seqkit restart
, for resetting start position for circular genome. seqkit sliding
: add flag-g/--greedy
, exporting last subsequences even shorter than windows size.seqkit seq
:- add flag
-m/--min-len
and-M/--max-len
to filter sequences by length. - rename flag
-G/--gap-letter
to-G/--gap-letters
.
- add flag
seqkit stat
:- renamed to
seqkit stats
, don't worry, old name is still available as an alias. - add new flag
-a/all
, for all statistics, includingsum_gap
,N50
, andL50
.
- renamed to
- new command
- SeqKit v0.4.5
seqkit seq
: fix bug of failing to reverse quality of FASTQ sequence
- SeqKit v0.4.4
seqkit locate
: fix bug of missing regular-expression motifs containing non-DNA characters (e.g.,ACT.{6,7}CGG
) from motif file (-f
).- compiled with go v1.8.
- SeqKit v0.4.3
- fix bug of
seqkit stat
:min_len
always be0
in versions: v0.4.0, v0.4.1, v0.4.2
- fix bug of
- SeqKit v0.4.2
- fix header information of
seqkit subseq
when restriving up- and down-steam sequences using GTF/BED file.
- fix header information of
- SeqKit v0.4.1
- enchancement: remove redudant regions for
seqkit locate
.
- enchancement: remove redudant regions for
- SeqKit v0.4.0
- fix bug of
seqkit locate
, e.g, only find two locations (1-4
,7-10
, missing4-7
) ofACGA
inACGACGACGA
. - better output of
seqkit stat
for empty file.
- fix bug of
- SeqKit v0.3.9
- fix bug of region selection for blank sequences. affected commands include
seqkit subseq --region
,seqkit grep --region
,seqkit split --by-region
. - compile with go1.8beta1.
- fix bug of region selection for blank sequences. affected commands include
- SeqKit v0.3.8.1
- enhancement and bugfix of
seqkit common
: two or more same files allowed, fix log information of number of extracted sequences in the first file.
- enhancement and bugfix of
- SeqKit v0.3.8
- enhancement of
seqkit common
: better handling of files containing replicated sequences
- enhancement of
- SeqKit v0.3.7
- fix bug in
seqkit split --by-id
when sequence ID contains invalid characters for system path. - add more flags validation for
seqkit replace
. - enhancement: raise error when key pattern matches multiple targes in cases of replacing with key-value files and more controls are added.
- changes: do not wrap sequence and quality in output for FASTQ format.
- fix bug in
- SeqKit v0.3.6
- add new feature for
seqkit grep
: new flag-R
(--region
) for specifying sequence region for searching.
- add new feature for
- SeqKit v0.3.5
- fig bug of
seqkit grep
: flag-i
(--ignore-case
) did not work when not using regular expression
- fig bug of
- SeqKit v0.3.4.1
- improve performance of reading (~10%) and writing (100%) gzip-compressed file
by using
github.com/klauspost/pgzip
package - add citation
- improve performance of reading (~10%) and writing (100%) gzip-compressed file
by using
- SeqKit v0.3.4
- bugfix:
seq
wrongly handles only the first one sequence file when multiple files given - new feature:
fx2tab
can output alphabet letters of a sequence by flag-a
(--alphabet
) - new feature: new flag
-K
(--keep-key
) forreplace
, when replacing with key-value file, one can choose keeping the key as value or not.
- bugfix:
- SeqKit v0.3.3
- fix bug of
seqkit replace
, wrongly starting from 2 when using{nr}
in-r
(--replacement
) - new feature:
seqkit replace
supports replacement symbols{nr}
(record number) and{kv}
(corresponding value of the key ($1) by key-value file)
- fix bug of
- SeqKit v0.3.2
- fix bug of
seqkit split
, error when target file is in a directory. - improve performance of
seqkit spliding
for big sequences, and output last part even if it's shorter than window sze, output of FASTQ is also supported.
- fix bug of
- SeqKit v0.3.1.1
- compile with go1.7rc5, with higher performance and smaller size of binary file
- SeqKit v0.3.1
- improve speed of
seqkit locate
- improve speed of
- SeqKit v0.3.0
- use fork of github.com/brentp/xopen, using
zcat
for speedup of .gz file reading on *nix systems. - improve speed of parsing sequence ID when creating FASTA index
- reduce memory usage of
seqkit subseq --gtf
- fix bug of
seqkit subseq
when using flag--id-ncbi
- fix bug of
seqkit split
, outdir error - fix bug of
seqkit seq -p
, last base is wrongly failed to convert when sequence length is odd. - add "sum_len" result for output of
seqkit stat
- use fork of github.com/brentp/xopen, using
- seqkit v0.2.9
- fix minor bug of
seqkit split
andseqkit shuffle
, header name error due to improper use of pointer - add option
-O (--out-dir)
toseqkit split
- fix minor bug of
- seqkit v0.2.8
- improve speed of parsing sequence ID, not using regular expression for default
--id-regexp
- improve speed of record outputing for small-size sequences
- fix minor bug:
seqkit seq
for blank record - update benchmark result
- improve speed of parsing sequence ID, not using regular expression for default
- seqkit v0.2.7
- reduce memory usage by optimize the outputing of sequences.
detail: using
BufferedByteSliceWrapper
to resuse bytes.Buffer. - reduce memory usage and improve speed by using custom buffered
reading mechanism, instead of using standard library
bufio
, which is slow for large genome sequence. - discard strategy of "buffer" and "chunk" of FASTA/Q records, just parse records one by one.
- delete global flags
-c (--chunk-size)
and-b (--buffer-size)
. - add function testing scripts
- reduce memory usage by optimize the outputing of sequences.
detail: using
- seqkit v0.2.6
- fix bug of
seqkit subseq
: Inplace subseq method leaded to wrong result
- fix bug of
- seqkit v0.2.5.1
- fix a bug of
seqkit subseq
: chromesome name was not be converting to lower case when using--gtf
or--bed
- fix a bug of
- seqkit v0.2.5
- fix a serious bug brought in
v0.2.3
, using unsafe method to convertstring
to[]byte
- add awk-like built-in variable of record number (
{NR}
) forseqkit replace
- fix a serious bug brought in
- seqkit v0.2.4.1
- fix several bugs from library
bio
, affected situations:- Locating patterns in sequences by pattern FASTA file:
seqkit locate -f
- Reading FASTQ file with record of which the quality starts with
+
- Locating patterns in sequences by pattern FASTA file:
- add command
version
- fix several bugs from library
- seqkit v0.2.4
- add subcommand
head
- add subcommand
- seqkit v0.2.3
- reduce memory occupation by avoid copy data when convert
string
to[]byte
- speedup reverse-complement by avoid repeatly calling functions
- reduce memory occupation by avoid copy data when convert
- seqkit v0.2.2
- reduce memory occupation of subcommands that use FASTA index
- seqkit v0.2.1
- improve performance of outputing.
- fix bug of
seqkit seq -g
for FASTA fromat - some other minor fix of code and docs
- update benchmark results
- seqkit v0.2.0
- reduce memory usage of writing output
- fix bug of
subseq
,shuffle
,sort
when reading from stdin - reduce memory usage of
faidx
- make validating sequences an optional option in
seq
command, it saves some time.
- seqkit v0.1.9
- using custom FASTA index file extension:
.seqkit.fai
- reducing memory usage of
sample --number --two-pass
- change default CPU number to 2 for multi-cpus computer, and 1 for single-CPU computer
- using custom FASTA index file extension:
- seqkit v0.1.8
- add subcommand
rename
to rename duplicated IDs - add subcommand
faidx
to create FASTA index file - utilize faidx to improve performance of
subseq
shuffle
,sort
and split support two-pass mode (by flag-2
) with faidx to reduce memory usage.- document update
- add subcommand
- seqkit v0.1.7
- add support for (multi-line) FASTQ format
- update document, add technical details
- rename subcommands
fa2tab
andtab2fa
tofx2tab
andtab2fx
- add subcommand
fq2fa
- add column "seq_format" to
stat
- add global flag
-b
(--bufer-size
) - little change of flag in
subseq
and some other commands
- seqkit v0.1.6
- add subcommand
replace
- add subcommand
- seqkit v0.1.5.2
- fix bug of
grep
, when not using flag-r
, flag-i
will not take effect.
- fix bug of
- seqkit v0.1.5.1
- fix result of
seqkit sample -n
- fix benchmark script
- fix result of
- seqkit v0.1.5
- add global flag
--id-ncbi
- add flag
-d
(--dup-seqs-file
) and-D
(--dup-num-file
) for subcommandrmdup
- make using MD5 as an optional flag
-m
(--md5
) in subcommandrmdup
andcommon
- fix file name suffix of
seqkit split
result - minor modification of
sliding
output
- add global flag
- seqkit v0.1.4.1
- change alignment of
stat
output - preciser CPUs number control
- change alignment of
- seqkit v0.1.4
- add subcommand
sort
- improve subcommand
subseq
: supporting of getting subsequences by GTF and BED files - change name format of
sliding
result - prettier output of
stat
- add subcommand
- seqkit v0.1.3.1
- Performance improvement by reducing time of cleaning spaces
- Document update
- seqkit v0.1.3
- Further performance improvement
- Rename sub command
extract
togrep
- Change default value of flag
--threads
back CPU number of current device, change default value of flag--chunk-size
back 10000 sequences. - Update benchmark
- seqkit v0.1.2
- Add flag
--dna2rna
and--rna2dna
to subcommandseq
.
- Add flag
- seqkit v0.1.1
- 5.5X speedup of FASTA file parsing by avoid using regular expression to remove spaces (detail ) and using slice indexing instead of map to validate letters (detail)
- Change default value of global flag
-- thread
to 1. Since most of the subcommands are I/O intensive, For computation intensive jobs, like extract and locate, you may set a bigger value. - Change default value of global flag
--chunk-size
to 100. - Add subcommand
stat
- Fix bug of failing to automatically detect alphabet when only one record in file.
- seqkit v0.1
- first release of seqkit