Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodonOptimize replaces a codon with non-synonymous codon #86

Open
dadddda opened this issue Sep 6, 2024 · 2 comments
Open

CodonOptimize replaces a codon with non-synonymous codon #86

dadddda opened this issue Sep 6, 2024 · 2 comments

Comments

@dadddda
Copy link

dadddda commented Sep 6, 2024

Optimizing a gene coding sequence with a custom codon table using the DnaChisel version 3.2.11 gives unexpected output.

This is the snippet from the Python code used:

problem = DnaOptimizationProblem(
  sequence=str(sequence.seq).upper(),
  objectives=[CodonOptimize(
    method="use_best_codon",
    codon_usage_table=codon_usage_dict
  )]
)

problem.resolve_constraints()
problem.optimize_with_report(target=f"report_{index}.zip")

This is the dict of a custom codon table:

{'M': {'ATG': 1.0}, 'I': {'ATT': 0.566, 'ATC': 0.1841, 'ATA': 0.2499}, 'A': {'GCA': 0.3201, 'GCT': 0.4355, 'GCC': 0.1597, 'GCG': 0.0848}, 'S': {'TCA': 0.3316, 'TCT': 0.2708, 'AGT': 0.1814, 'AGC': 0.1026, 'TCC': 0.0569, 'TCG': 0.0566}, 'Q': {'CAG': 0.2336, 'CAA': 0.7664}, 'D': {'GAT': 0.6968, 'GAC': 0.3032}, 'V': {'GTC': 0.1422, 'GTT': 0.452, 'GTA': 0.249, 'GTG': 0.1569}, 'C': {'TGC': 0.3024, 'TGT': 0.6976}, 'Y': {'TAT': 0.6334, 'TAC': 0.3666}, 'T': {'ACT': 0.4045, 'ACA': 0.3398, 'ACC': 0.1949, 'ACG': 0.0607}, 'L': {'TTG': 0.1893, 'TTA': 0.3168, 'CTC': 0.0684, 'CTT': 0.2769, 'CTG': 0.0578, 'CTA': 0.0908}, 'N': {'AAT': 0.7284, 'AAC': 0.2716}, 'F': {'TTT': 0.8298, 'TTC': 0.1702}, 'E': {'GAA': 0.7289, 'GAG': 0.2711}, 'K': {'AAA': 0.7403, 'AAG': 0.2597}, 'G': {'GGT': 0.3882, 'GGC': 0.2088, 'GGG': 0.059, 'GGA': 0.344}, 'P': {'CCA': 0.4706, 'CCT': 0.3382, 'CCG': 0.0914, 'CCC': 0.0997}, 'R': {'CGC': 0.0747, 'CGA': 0.2287, 'CGT': 0.1656, 'AGA': 0.4013, 'CGG': 0.038, 'AGG': 0.0916}, 'H': {'CAT': 0.7248, 'CAC': 0.2752}, 'W': {'TGG': 1.0}, '*': {'TAA': 0.5642, 'TAG': 0.1453, 'TGA': 0.2906}}

This is the input sequence:

ATGAACAATACGATTGAAACCATTCTTGCTCATCGCTCTATCCGAAAATTCACCGCAGTTCCTATTACTGATGAACAAAGACAAACCATCATTCAAGCAGGTTTAGCTGCGTCTTCTTCTAGTATGCTTCAAGTCGTCTCAATCGTTCGAGTGACTGACTCTGAAAAGCGTAACGAATTGGCTCAATTTGCTGGTAACCAAGCTTATGTTGAAAGTGCGGCTGAGTTCTTAGTGTTTTGTATTGATTATCAGCGCCATGCAACCATCAATCCTGATGTACAGGCAGACTTTACAGAACTAACTCTGATTGGAGCAGTAGATTCTGGAATCATGGCACAAAACTGCTTGCTTGCAGCCGAGTCTATGGGATTAGGTGGCGTATATATTGGAGGACTAAGGAATAGCGCAGCTCAAGTTGATGAGCTATTGGGCTTACCGGAAAATAGCGCGGTGTTGTTTGGTATGTGCTTAGGGCATCCCGATCAAAATCCCGAAGTAAAGCCACGCCTACCTGCACATGTGGTTGTTCATGAAAATCAATACCAAGAGCTAAATTTAGATGATATTCAGAGCTACGATCAAACTATGCAAGCGTATTATGCGAGCCGTACAAGCAATCAAAAACTGAGTACATGGTCGCAAGAAGTCACTGGGAAGCTTGCTGGTGAGTCGCGACCTCATATTCTGCCGTACTTGAACAGTAAGGGGCTAGCAAAACGCTAA

This is the output sequence:

ATGAAAAATACTATTGAAACTATTCCAGCTCATCCATCAATGCCAAAATTAACTGCTGTTCCAATTACTGATGAACAAAGACAAACTATGATTCAAGCTGGTTTAGCTGCTTCATCATCAAGAATGCCACAAGTTGTTTCAATGGTTCCAGTTACTGAATCAGAAAATCCAAAAGAATTTGCTCAATTTGCTGGTAAACAAGCTTATGTTGAAAGAGCTGCTGATTTATTAGTTTTTTGTATTGATTATCATCCACATGCTACTATGAATCCAGATGTTCATGCTGAATTTACTGAACCAACTCCAATTGGTGCTGTTGATTCAGGTATGATGGCTCAAAAATGGTTTCCAGCTGCTGATTCAATGGGTTTAGGTGGTGTTTATATTGGTGGTCCAAGAAATAGAGCTGCTCAAGTTGATGATCCATTTGGTTTACCAGAAAATAGAGCTGTTTTTTTTGGTATGTGGTTAGGTCATCCAGATCAAAATCCAGAAGTTAATCCACCACCACCAGCTCATGTTGTTGTTCATGAAAATCAATAACAAGATCCAAATTTAGATGATATTCATAGATAAGATCAAACTATGCAAGCTTATTATGCTAGACCAACTAGAAATCAAAAACCAAGAACTTGGTCACAAGAAGTTACTGGTAATCCAGCTGGTGATTCACCACCACATATTCCACCATAATTTAAAAGAAATGGTCCAGCTAAACCATAA

By observing codons in the output sequence, e.g. the first codon that comes after the start codon, it is replaced by another amino acid coding codon AAC -> AAA instead of it's synonym. From the given codon table AAC should have been replaced by AAT according to this: {..., 'N': {'AAT': 0.7284, 'AAC': 0.2716}, ...}.

Either this is a bug or I am missing something.

@Zulko
Copy link
Member

Zulko commented Sep 7, 2024

This is not a bug but this is an important caveat in DnaChisel: the CodonOptimize() objective should always be used with a EnforceTranslation constraint. In your example you should add constraints=[EnforceTranslation()] to your DnaOptimizationProblem. This is because objectives never constrain the sequence, only constraints do. This is shown in the Readme and the examples, and there is a warning in the docs (Warning: always use with an EnforceTranslation constraint) but I would support adding more warnings everywhere.

@dadddda
Copy link
Author

dadddda commented Sep 7, 2024

Thanks for your reply. I was mainly using this link for guidance and the warning is definitely missing here: https://edinburgh-genome-foundry.github.io/DnaChisel/ref/builtin_specifications.html. I think it would be good to add that warning for every appropriate specification description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants