Skip to content

Conversation

@manulera
Copy link
Collaborator

@manulera manulera commented Nov 11, 2025

Same as #483, but rebased. Please continue developing on this one @BjornFJohansson

Hi @BjornFJohansson look at the last commit where I fixed the looped function, now it passes the tests. I created the draft PR so we can discuss here.

I like the changes to assembly2, I think they make things clearer, and the overriding of the PCR assembly function makes a lot of sense.

I wonder if this bit from assembly2 could be turned into a function (strands_anneal or something), or some way to test for reverse-complementarily:

        seq_u = loc_u.extract(f_u).seq
        seq_v = loc_v.extract(f_v).seq
        # instead of testing for identity we test if seq_u and seq_v anneal
        anneal = all(basepair_dict.get(x, y) for x, y in zip(str(seq_u), str(seq_v)))
        if not anneal:

…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict
2. new Dseq.__init__ w same arguments as before, but data is now stored in Bio.Seq.Seq._data
3. altered Dseq.quick classmethod
4. watson, crick and ovhg are methods decorated with @Property
5. New method to_blunt_string with returns a the string of the watson strand if the underlying Dseq object was blunt.
6. Old __getitem__ replaced
7. New __repr__ method
8. new looped method
9. new __add__ method
… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.
…ytestrings

2. user method that removes U and leaves an empty site.

3. cast_to_ds_right, cast_to_ds_left methods, these are *not* fill_in methods as they do not rely on a polymerase.

4. New melt method, useful for USER cloning etc..

5. reimplemented apply_cut method
…e x and y has meaning in the new Dseq implementation. (line 1074)

2. The expected result in test_pcr_assembly_uracil should be AUUAggccggTTOO.
3. Removed numbers at start and end of some sequenses. This could be discussed.
4. Four instances of FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.
fuction dsbreaks is called from pydna.alphabet in __init__

simplified code overall, fuction get_parts from pydna.alphabet used in several places

simpler looped method using get_parts and __add__

improved error message from __add__
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a major refactoring of pydna's DNA sequence representation system, implementing a new "dsIUPAC" alphabet (dscode) to better handle double-stranded DNA with overhangs, single-stranded regions, and USER enzyme treatment. The changes enable new molecular cloning techniques like USER cloning while maintaining backward compatibility.

Key changes:

  • New alphabet system with dscode symbols representing base pairs and single-stranded regions
  • Refactored Dseq class with improved internal representation and new methods for DNA manipulation
  • Enhanced support for sticky ends, melting, and enzymatic treatments (USER, T4, mung bean nuclease)

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 27 comments.

Show a summary per file
File Description
src/pydna/alphabet.py New module defining dscode alphabet with base pair dictionaries and translation tables
src/pydna/dseq.py Major refactoring of Dseq class with new internal representation and manipulation methods
src/pydna/utils.py Added anneal_from_left function and updated complement logic
src/pydna/assembly2.py Updated assembly logic to use new Dseq methods (cast_to_ds_, exo1_)
src/pydna/amplify.py Improved primer annealing detection using new alphabet system
src/pydna/dseqrecord.py Updated looped() method to handle features properly with sticky ends
tests/test_new.py New test file for dscode representations
tests/test_USERcloning.py Complete rewrite for USER enzyme cloning
tests/test_module_dseq.py Extensive test updates for new Dseq behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@manulera manulera mentioned this pull request Nov 18, 2025
@manulera
Copy link
Collaborator Author

manulera commented Nov 18, 2025

Hi @BjornFJohansson, below my review of the PR with the things that I think should be fixed before merging. I have also added some changes myself in a PR to this branch (#489). Feel free to cherry-pick the ones you agree with or just merge it.

  • docstrings: some docstrings tests are not passing
  • documentation building fails: when the other tests pass, I can have a look and fix this myself.
  • GitHub Copilot comments: it has made some good suggestions, see if you agree with some and you can add those.
  • alphabet.py:
    • I think it is better to use a normal class instead of a namedtuple. With the class, you get property auto-completion and type hints, but not with the namedtuple. Also, I don't think people are familiar with namedtuples (I did not even know they existed). I have included this in a PR.
    • Some functions are untested and what they do is not documented. It would be good to provide minimal input-output examples in the docstring and/or unit tests.
      • dsbreaks
      • regex_ss_melt_factory
      • regex_ds_melt_factory
  • assembly2:
    • Removed FIXMEs related to the Sanity check on line 770 in assembly2. I haved fixed those in the PR.
  • test_new:
    • Should these go into the test_module_dseq.py file?
  • dseq.py:
    • issue with melt as discussed in Usage of apply_cut in melt #487.
    • mw: should this handle U characters now? Also, this can probably use a dictionary from biopython rather than hardcoding the values. Can be made into a separate issue.
    • find: given the new functionality, it would be good to have some unit tests for circular sequences where the substring spans the origin.
    • translate: we discussed that this method could be removed and maybe rely on BioPython's translate method instead. It can be a good occasion for this, but it can go into a separate issue.
    • Usage of get_parts: it may make sense to create a class method for dseq, so that instead of having to call get_parts(self._data.decode("ascii")), one can call self.get_parts() or a similar name. That also will make the method more discoverable, since it will be in the Dseq class documentation.
    • shed_ss_dna: this is undocumented and untested, it would be good to explain what it does and provide some examples.

@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 87.22003% with 97 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pydna/dseq.py 80.25% 64 Missing and 13 partials ⚠️
src/pydna/design.py 81.57% 7 Missing and 7 partials ⚠️
src/pydna/alphabet.py 99.03% 1 Missing and 1 partial ⚠️
src/pydna/dseqrecord.py 91.66% 2 Missing ⚠️
src/pydna/assembly2.py 96.15% 0 Missing and 1 partial ⚠️
src/pydna/ligate.py 0.00% 0 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
- Coverage   93.67%   92.57%   -1.10%     
==========================================
  Files          40       41       +1     
  Lines        4740     5183     +443     
  Branches      669      723      +54     
==========================================
+ Hits         4440     4798     +358     
- Misses        243      306      +63     
- Partials       57       79      +22     
Files with missing lines Coverage Δ
src/pydna/__init__.py 59.52% <100.00%> (+11.03%) ⬆️
src/pydna/amplify.py 98.62% <100.00%> (-0.02%) ⬇️
src/pydna/opencloning_models.py 98.31% <ø> (ø)
src/pydna/seq.py 74.54% <100.00%> (-1.53%) ⬇️
src/pydna/seqrecord.py 86.69% <ø> (ø)
src/pydna/utils.py 88.84% <100.00%> (+0.04%) ⬆️
src/pydna/assembly2.py 96.39% <96.15%> (-0.30%) ⬇️
src/pydna/ligate.py 82.85% <0.00%> (ø)
src/pydna/alphabet.py 99.03% <99.03%> (ø)
src/pydna/dseqrecord.py 92.80% <91.66%> (-0.43%) ⬇️
... and 2 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants