Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cutadapt to 4.4 #365

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

pyup-bot
Copy link
Contributor

This PR updates cutadapt from 2.4 to 4.4.

Changelog

4.3

-----------------

* :pr:`663`: Cutadapt became significantly faster due to an added runtime
heuristic that avoids running the full alignment algorithm if it can be
proven that it cannot succeed. Thanks to rhpvorderman for this great
improvement!
* :issue:`665`: 5' adapters did not allow partial matches in the beginning
when the :ref:`rightmost <rightmost>` adapter-search parameter was used.
* :issue:`662`: Fixed assertion error when ``--discard-untrimmed`` was used
together with ``--json`` and demultiplexing.
* :issue:`674`: When reading 3' adapters from an external file, they can now
all be anchored by using the syntax ``-a file$:adapters.fasta`` (note the
``$`` in ``file$:``).
* :issue:`669`: The ``--rename`` option now understands the ``\t`` escape
sequence and will insert a tab character in its place. This is useful when
transferring FASTQ header comments to SAM tags.

4.2

-----------------

* :issue:`654`: When determining the error rate for a partial match of an
adapter with ``N`` wildcards, the number of non-N bases was not computed
correctly, which could lead to matches not being found.
* :issue:`546`: Automatically replace ``I`` in adapter sequences with ``N``.
``I`` is used to encode inosine, which matches any base. Contributed by peterjc.
* :issue:`528`: Cutadapt should now no long hang in multicore mode when an error
was raised in a worker process (for example, when an incorrectly formatted
FASTQ file was encountered).

4.1

-----------------

* :issue:`624`: You can now combine reading adapter sequences from an external file
with the search parameter notation. For example,
``-a "file:adapters.fasta;min_overlap=5"`` sets the minimum overlap to 5 for all
adapters in ``adapters.fasta``.
* :issue:`361`: When reading 5' adapters from an external file, they can now
all be anchored by using the syntax ``-g ^file:adapters.fasta``
(note the ``^`` before ``file:``).
* :issue:`254`: Finding the *rightmost* 5' adapter occurrence is now supported by using the
``rightmost`` search parameter (the default is to find the leftmost occurrence).
* :issue:`615`: Fix linked adapter statistics for 5' and 3' end not
being reported separated correctly.
* :issue:`616`: Report correct number of quality-trimmed bases when
both ``-q`` and ``--nextseq-trim`` are used.

4.0

-----------------

* :issue:`604`, :pr:`608`: The :ref:`alignment algorithm was tweaked <algorithm-indel-scores>`
to penalize indels more and to more accurately pick the leftmost adapter
occurrence if there are multiple. This will normally affect very few
reads, but should generally lead to fewer surprising results in cases
where it matters. Because this changes trimming results, it was appropriate
to bump the major version to 4.
* :issue:`607`: Print an error when an output file was specified
multiple times (for example, for ``--untrimmed-output`` and
``--too-short-output``). Sending output from different filters to
the same file is not supported at the moment.
* :issue:`603`: When ``-e`` was used with an absolute number of errors
and there were ``N`` wildcards in the sequence, the actual number of
allowed errors was too low.
* Speed up quality trimming (both ``-q`` and ``--nextseq-trim``) somewhat.
* Python 3.6 is no longer supported as it is end-of-life.

3.7

-----------------

* :issue:`600`: Fixed ``{match_sequence}`` placeholder not working when
renaming paired-end reads.

3.6

---------------------

* :issue:`437`: Add ``{match_sequence}`` to the placeholders that ``--rename``
accepts. This allows to add the sequence matching an adapter (including
errors) to the read header. An empty string is inserted if there is no match.
* :issue:`589`: Windows wheels are now available on PyPI. That is,
``pip install`` will no longer attempt to compile things, but just install
a pre-compiled binary.
* :issue:`592`: Clarify in documentation and error messages that anchored
adapters need to match in full and that therefore setting an explict
minimum overlap (``min_overlap=``, ``o=``) for them is not possible.

3.5

-----------------

* :issue:`555`: Add support for dumping statistics in JSON format using ``--json``.
* :issue:`541`: Add a "Read fate breakdown" section heading to the report, and also
add statistics for reads discarded because of ``--discard-untrimmed`` and
``--discard-trimmed``. With this, the numbers in that section should add up to 100%.
* Add option ``-Q``, which allows to specify a quality-trimming threshold for R2 that is
different from the one for R1.
* :issue:`567`: Add ``noindels`` adapter-trimming parameter. You can now write
``-a "ADAPTER;noindels"`` to disallow indels for a single adapter only.
* :issue:`570`: Fix ``--pair-adapters`` not finding some pairs when reads contain
more than one adapter.
* :issue:`524`: Fix a memory leak when using ``--info-file`` with multiple cores.
* :issue:`559`: Fix adjacent base statistics not being shown for linked adapters.

3.4

-----------------

* :issue:`481`: An experimental single-file Windows executable of Cutadapt
is `available for download on the GitHub "releases"
page <https://github.com/marcelm/cutadapt/releases>`_.
* :issue:`517`: Report correct sequence in info file if read was reverse complemented
* :issue:`517`: Added a column to the info file that shows whether the read was
reverse-complemented (if ``--revcomp`` was used)
* :issue:`320`: Fix (again) "Too many open files" when demultiplexing

3.3

-----------------

* :issue:`504`: Fix a crash on Windows.
* :issue:`490`: When ``--rename`` is used with ``--revcomp``, disable adding the
``rc`` suffix to reads that were reverse-complemented.
* Also, there is now a ``{rc}`` template variable for the ``--rename`` option, which
is replaced with "rc" if the read was reverse-complemented (and the empty string if not).
* :issue:`512`: Fix issue :issue:`128` once more (the “Reads written” figure in the report
incorrectly included both trimmed and untrimmed reads if ``--untrimmed-output`` was used).
* :issue:`515`: The report is now sent to stderr if any output file is
written to stdout

3.2

-----------------

* :issue:`437`: Implement a ``--rename`` option for :ref:`flexible read
name modifications <read-renaming>` such as moving a barcode sequence
into the read name.
* :issue:`503`: The index for demultiplexing is now created a lot faster
(within seconds instead of minutes) when allowing indels.
* :issue:`499`: Fix combinatorial demultiplexing not working when using
multiple cores.

3.1

-----------------

* :issue:`443`: With ``--action=retain``, it is now possible to trim reads while
leaving the adapter sequence itself in the read. That is, only the sequence
before (for 5’ adapters) or after (for 3’ adapters) is removed. With linked
adapters, both adapters are retained.
* :issue:`495`: Running with multiple cores did not work using macOS and Python 3.8+.
To prevent problems like these in the future, automated testing has been extended
to also run on macOS.
* :issue:`482`: Print statistics for ``--discard-casava`` and ``--max-ee`` in the
report.
* :issue:`497`: The changelog for 3.0 previously forgot to mention that the following
options, which were deprecated in version 2.0, have now been removed, and
using them will lead to an error: ``--format``, ``--colorspace``, ``-c``, ``-d``,
``--double-encode``, ``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``,
``--bwa``, ``--no-zero-cap``. This frees up some single-character options,
allowing them to be re-purposed for future Cutadapt features.

3.0

-----------------

* Demultiplexing on multiple cores is now supported. This was the last feature that
only ran single-threaded.
* :issue:`478`: Demultiplexing now always generates all possible output files.
* :issue:`358`: You can now use ``-e`` also :ref:`to specify the maximum number of
errors <error-tolerance>` (instead of the maximum error rate). For example, write
``-e 2`` to allow two errors over a full-length adapter match.
* :pr:`486`: Trimming many anchored adapters (for example when demultiplexing)
is now faster by using an index even when indels are allowed. Previously, Cutadapt
would only be able to build an index with ``--no-indels``.
* :issue:`469`: Cutadapt did not run under Python 3.8 on recent macOS versions.
* :issue:`425`: Change the default compression level for ``.gz`` output files from 6
to 5. This reduces the time used for compression by about 50% while increasing file
size by less than 10%. To get the old behavior, use ``--compression-level=6``.
If you use Cutadapt to create intermediate files that are deleted anyway,
consider also using the even faster option ``-Z`` (same as ``--compression-level=1``).
* :pr:`485`: Fix that, under some circumstances, in particular when trimming a
5' adapter and there was a mismatch in its last nucleotide(s), not the entire adapter
sequence would be trimmed from the read. Since fixing this required changed the
alignment algorithm slightly, this is a backwards incompatible change.
* Fix that the report did not include the number of reads that are too long, too short
or had too many ``N``. (This unintentionally disappeared in a previous version.)
* :issue:`487`: When demultiplexing, the reported number of written pairs was
always zero.
* :issue:`497`: The following options, which were deprecated in version 2.0, have
been removed, and using them will lead to an error:
``--format``, ``--colorspace``, ``-c``, ``-d``, ``--double-encode``,
``-t``, ``--trim-primer``, ``--strip-f3``, ``--maq``, ``--bwa``, ``--no-zero-cap``.
This frees up some single-character options,
allowing them to be re-purposed for future Cutadapt features.
* Ensure Cutadapt runs under Python 3.9.
* Drop support for Python 3.5.

2.10

------------------

* Fixed a performance regression introduced in version 2.9.
* :pr:`449`: ``--action=`` could not be used with ``--pair-adapters``.
Fix contributed by wlokhorst.
* :issue:`450`: ``--untrimmed-output``, ``--too-short-output`` and ``--too-long-output`` can
now be written interleaved.
* :issue:`453`: Fix problem that ``N`` wildcards in adapters did not match ``N`` characters
in the read. ``N`` characters now match any character in the read, independent of whether
``--match-read-wildcards`` is used or not.
* With ``--action=lowercase``/``mask``, print which sequences would have been
removed in the “Overview of removed sequences” statistics. Previously, it
would show that no sequences have been removed.

2.9

-----------------

* :issue:`441`: Add a ``--max-ee`` (or ``--max-expected-errors``) option
for filtering reads whose number of expected errors exceeds the given
threshold. The idea comes from
`Edgar et al. (2015) <https://academic.oup.com/bioinformatics/article/31/21/3476/194979>`_.
* :issue:`438`: The info file now contains the `` rc`` suffix that is added to
the names of reverse-complemented reads (with ``--revcomp``).
* :issue:`448`: ``.bz2`` and ``.xz`` output wasn’t possible in multi-core mode.

2.8

-----------------

* :issue:`220`: With option ``--revcomp``, Cutadapt now searches both the read
and its reverse complement for adapters. The version that matches best is
kept. This can be used to “normalize” strandedness.
* :issue:`430`: ``--action=lowercase`` now works with linked adapters
* :issue:`431`: Info files can now be written even for linked adapters.

2.7

-----------------

* :issue:`427`: Multicore is now supported even when using ``--info-file``,
``--rest-file`` or ``--wildcard-file``. The only remaining feature that
still does not work with multicore is now demultiplexing.
* :issue:`290`: When running on a single core, Cutadapt no longer spawns
external ``pigz`` processes for writing gzip-compressed files. This is a first
step towards ensuring that using ``--cores=n`` uses only at most *n* CPU
cores.
* This release adds support for Python 3.8.

2.6

-----------------

* :issue:`395`: Do not show animated progress when ``--quiet`` is used.
* :issue:`399`: When two adapters align to a read equally well (in terms
of the number of matches), prefer the alignment that has fewer errors.
* :issue:`401` Give priority to adapters given earlier on the command
line. Previously, the priority was: All 3' adapters, all 5' adapters,
all anywhere adapters. In rare cases this could lead to different results.
* :issue:`404`: Fix an issue preventing Cutadapt from being used on Windows.
* This release no longer supports Python 3.4 (which has reached end of life).

2.5

-----------------

* :issue:`391`: Multicore is now supported even when using
``--untrimmed-output``, ``--too-short-output``, ``--too-long-output``
or the corresponding ``...-paired-output`` options.
* :issue:`393`: Using ``--info-file`` no longer crashes when processing
paired-end data. However, the info file itself will only contain results
for R1.
* :issue:`394`: Options ``-e``/``--no-indels``/``-O`` were ignored for
linked adapters
* :issue:`320`: When a “Too many open files” error occurs during
demultiplexing, Cutadapt can now automatically raise the limit and
re-try if the limit is a “soft” limit.
Links

@pyup-bot pyup-bot mentioned this pull request Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant