-
Notifications
You must be signed in to change notification settings - Fork 50
New dseq manu bjorn rebased #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict
2. new Dseq.__init__ w same arguments as before, but data is now stored in Bio.Seq.Seq._data 3. altered Dseq.quick classmethod 4. watson, crick and ovhg are methods decorated with @Property 5. New method to_blunt_string with returns a the string of the watson strand if the underlying Dseq object was blunt. 6. Old __getitem__ replaced 7. New __repr__ method 8. new looped method 9. new __add__ method
… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.
…ytestrings 2. user method that removes U and leaves an empty site. 3. cast_to_ds_right, cast_to_ds_left methods, these are *not* fill_in methods as they do not rely on a polymerase. 4. New melt method, useful for USER cloning etc.. 5. reimplemented apply_cut method
… utils. This should fix U in primers
…XME indicating a large change in behaviour.
…e x and y has meaning in the new Dseq implementation. (line 1074) 2. The expected result in test_pcr_assembly_uracil should be AUUAggccggTTOO. 3. Removed numbers at start and end of some sequenses. This could be discussed. 4. Four instances of FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.
…he check for internal splits in init
fuction dsbreaks is called from pydna.alphabet in __init__ simplified code overall, fuction get_parts from pydna.alphabet used in several places simpler looped method using get_parts and __add__ improved error message from __add__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a major refactoring of pydna's DNA sequence representation system, implementing a new "dsIUPAC" alphabet (dscode) to better handle double-stranded DNA with overhangs, single-stranded regions, and USER enzyme treatment. The changes enable new molecular cloning techniques like USER cloning while maintaining backward compatibility.
Key changes:
- New alphabet system with dscode symbols representing base pairs and single-stranded regions
- Refactored Dseq class with improved internal representation and new methods for DNA manipulation
- Enhanced support for sticky ends, melting, and enzymatic treatments (USER, T4, mung bean nuclease)
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
| src/pydna/alphabet.py | New module defining dscode alphabet with base pair dictionaries and translation tables |
| src/pydna/dseq.py | Major refactoring of Dseq class with new internal representation and manipulation methods |
| src/pydna/utils.py | Added anneal_from_left function and updated complement logic |
| src/pydna/assembly2.py | Updated assembly logic to use new Dseq methods (cast_to_ds_, exo1_) |
| src/pydna/amplify.py | Improved primer annealing detection using new alphabet system |
| src/pydna/dseqrecord.py | Updated looped() method to handle features properly with sticky ends |
| tests/test_new.py | New test file for dscode representations |
| tests/test_USERcloning.py | Complete rewrite for USER enzyme cloning |
| tests/test_module_dseq.py | Extensive test updates for new Dseq behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @BjornFJohansson, below my review of the PR with the things that I think should be fixed before merging. I have also added some changes myself in a PR to this branch (#489). Feel free to cherry-pick the ones you agree with or just merge it.
|
Put string in ValueError. Co-authored-by: Copilot <[email protected]>
remove unused import Co-authored-by: Copilot <[email protected]>
removed unused import Co-authored-by: Copilot <[email protected]>
removed unused import Co-authored-by: Copilot <[email protected]>
removed commented-out code. Co-authored-by: Copilot <[email protected]>
remove unused import Co-authored-by: Copilot <[email protected]>
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## master #484 +/- ##
==========================================
- Coverage 93.67% 92.57% -1.10%
==========================================
Files 40 41 +1
Lines 4740 5183 +443
Branches 669 723 +54
==========================================
+ Hits 4440 4798 +358
- Misses 243 306 +63
- Partials 57 79 +22
🚀 New features to boost your workflow:
|
…ot finding suitable A-T pairs across the junction.
string. Added a Dseq.getparts method. Changed all calls to the getparts function to use the method. removed transcribe, translate methods (now in pydna.seq.Seq)
Same as #483, but rebased. Please continue developing on this one @BjornFJohansson
Hi @BjornFJohansson look at the last commit where I fixed the
loopedfunction, now it passes the tests. I created the draft PR so we can discuss here.I like the changes to assembly2, I think they make things clearer, and the overriding of the PCR assembly function makes a lot of sense.
I wonder if this bit from assembly2 could be turned into a function (strands_anneal or something), or some way to test for reverse-complementarily: