Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly Result Shows Longer Length than expected using the demo data #1

Open
FankangMeng opened this issue Sep 25, 2023 · 2 comments

Comments

@FankangMeng
Copy link

Description:
I've been attempting de novo assembly method of the demo data. However, with each attempt, I consistently obtain a result that is significantly longer in length than the original plasmid and the results are unstable each time.

I have also checked the demo results you provided. They are also longer than expected. For example, the result for barcode 1 should be 7604, but it's 15,633 using de novo assembly method.

Does anyone have a solution or suggestions for this issue? I'd greatly appreciate any help!

@veghp
Copy link
Member

veghp commented Sep 25, 2023

Thanks for trying this out! When a de novo assembly is attempted, Canu, the assembler, tries to identify whether the sequence is duplicated (see marbl/canu#1939 and the trim or suggestCircular parameters in the fasta header: https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo/blob/main/results_example/dir4_assembly/n3_assembly_trimmed/barcode01_denovo.fasta#L1).

This automatic identification is not always successful. This "duplication" happens when reads are derived from a circular sequence; putting together these reads results in an assembled sequence twice the length of the original.

Try and align the first half of your assembly against the second half to see if that's the case (the approximately double length suggests that yes).

@FankangMeng
Copy link
Author

Thanks for trying this out! When a de novo assembly is attempted, Canu, the assembler, tries to identify whether the sequence is duplicated (see marbl/canu#1939 and the trim or suggestCircular parameters in the fasta header: https://github.com/Edinburgh-Genome-Foundry/Sequeduct_demo/blob/main/results_example/dir4_assembly/n3_assembly_trimmed/barcode01_denovo.fasta#L1).

This automatic identification is not always successful. This "duplication" happens when reads are derived from a circular sequence; putting together these reads results in an assembled sequence twice the length of the original.

Try and align the first half of your assembly against the second half to see if that's the case (the approximately double length suggests that yes).

Thank you. It's very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants