Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add feature to generate consensus sequences #105

Open
Cactusolo opened this issue Jun 18, 2019 · 3 comments
Open

Request to add feature to generate consensus sequences #105

Cactusolo opened this issue Jun 18, 2019 · 3 comments
Assignees

Comments

@Cactusolo
Copy link

Cactusolo commented Jun 18, 2019

Hi,

I would like to request more features for pxconsq function in phyx :

  1. have a flag (or option) that user can choose preferred symbol for gap? Like "-" or "N", or no gap generated in the consensus sequence (like the strict major rule consensus in Geneious? see examples below)

  2. providing user defined consensus threshold value here. For example, in an alignment, the consensus base in a specific column depends on a threshold value. See explained here.

I think these features are useful when assembling target enrichment data, where people want consensus sequence for each gene as reference. So the direct pxconsq results will meet the need (current output containing too many Ns)

For example, I use one gene "g4471" from Angiosperms353_targetSequences.fasta to generate a strict consensus sequence as an example.

  • the alignmet looks like this (I added quote "" to escape markdown format)

cat genes_mafft/g353_alignment/g4471_mafft.fasta

">AJFN-4471"
"aatgttatacaggatgaagagaaactgaatactgcaaactccgattggatgcggaaatac
aaaggctcaagtaagcttatgctccaacctaggagcaccgaggaggtttcacagatactt
aaatattgtaattcgagacatcttgctgttgtcgtatgcgaagcaggatgcatattggaa
aacttgatttcattcctagataatgaaggatttattatgccgttagatttgggtgcaaaa
gggagttgtcaaattggtggaaatgtttcaacaaatgctgggggtttgcgccttgtccgt
tatggatcacttcacgggaacgtacttggtctcgaagctgtttta---gcaaatggtact
gttgttgacatgcttgggactttacgaaaagataatactgggtatgacctgaagcacttg
tttataggaagtgaaggatctttgggattgataactaagatttccatacttacccctcca
aagttatcttcagtaaatctagcttttcttgcttgtaaagattattacagttgccagaaa
cttctatttgaagccaagaggaaacttggggaaattttgtctgcatttgagtttctggat
gctcaatcactggatctggtcctgaaacatctagaaggtgctcggaatccattacctccc
tcac---tacacaacttctatattctgattgagacaacaggcagtgatga------atct
aatgac------------------------------------------------------
"------------------------------------------------------------"
...SKIP...
">TVSH-4471"
"------------------------------------------------------------"
"---------------------------------------------gtttctcagattctt"
"aaatattgtaactccagaaacttggctgttgttgtatgtgaagctgggtgcatattggaa
aatataatgtcattcctggacaatgaaggatttattatgccactagacttaggtgcaaaa
gggagttgccagattggtggaaatgtttcaactaatgctggaggtttgcgtcttgttcgc
tatggatcgcttcatggaagtgtacttggtatggaagctgttcta---gcagatggtact
gtacttgacatgcttaagaccttgcgcaaagataatactggctatgatttgaaacatctg
tttataggaagtgaaggttccttgggcattgttactaagatttcaatacttaccccacca
aagttgtcttcagtaaatgtggcttttcttgcttgcaaagactatatcagctgccagaaa
ttgctgcaggaggcaaaaaggaagcttggggagattttatctgcatttgaatttatggat
gtccagtctatgaatttggttttaaaacacatggaaggtgcacgaaatccacta---cca
tcat---tgcataacttttatgttttgattgagacaacaggcagtgatga------atct
tctgacaaacaaaaactggaagcatttcttcttggctccatggagaatgaattgatatct
gatggtgttcttgcacaagacataaaccaagcatcatctttttggcttctacgtgagggt"
">VUSY-4471
aaagtaattcaggatgaagagagactgcttactgcaaatatggattggatgcggaaatac
aaaggctcaagtaagcttctgctccaacctaggagcactgaggaggtttcgcagattctt
aaatactgtaattccagatgcctggctgttgttgtatgtgaggcaggatgcatattggaa
aacctggtttctttccttgataatgaaggatttatcatgccactagacttgggtgcaaaa
ggaagctgccaaattggtggaaatgtctcaactaatgctggtgggttgcgcttggtccgt
tatggatcacttcatgggaatgtacttggtcttgaagctgtttta---gcaaatggtacc
gtgcttgacattcttggaactttacgcaaagacaatactggatatgacttaaagcatttg
tttataggaagtgaaggatccttgggaattgtgactaaggtctccatacttacccctccg
aagctatcatcggtgaatctagcttttcttgcttgtaaagattatttcagctgccagaat
cttctattggaagccaagaggaagcttggggaaattctatctgcatttgaatttttggat
agccactcaatggatctggttctgaatcatctagaaggtgctcgaaatccattacctccc
tcaa---tgcacaacttttatgttctgattgagacaacagggagtgatga------atcc
tatgacaaagagaagcttgaggccttcctacttcattcaatggaaggtggtttgatatct
gatggtgttcttgcacaagacataaatcaagcatcatcattttggcggattcgtgaggga"
">XFJG-4471
aatgttattcaagatgaagataggttgctggctgcaaatgtggattggatggggaaatat
aaaggttctagccagcttttgctcttgccaaaaactactgaagaggtgtctaaaattctc
caatactgcaattccaggcgcttggctgttgtcatttgcgaagctgg---------tgac
aacctaaattcattcttagcaaatgaagggtttataatgccacttgatttgggagcaaaa
ggaagctgtcaaattggtggaaacatatcaacaaatgctggaggtttgcacttcatacgt
tacggatcactgcatggaaatattcttggccttgaagttgtctta---gctaatggaact
gttcttgatatgcttactactttacgtaaagacaatacaggatatgacttgaagcattta
ttcattggaagtgaaggtacattgggcattgtcacgaaggtctcaatactcacgcctcct
aagctagtatcaaataacatcgcgtttcttgcttgtaaagacttttcaagttgtcagaaa
ttactattggaggccaagagaggcttaggcgatgttatttctgcatttgaatttatggat
agccattctatggatatggttttaaatcacttagagggcgtccgcaaccctttacctcca
tcat---tatacaatttttatgttcttattgagacaaccagtagcgatga------atca
tatgacaaagctaagcttgaagccttcttgttaagttacatggaagatggtctcatatca
gatggtgttatagctcaggacatgaaccaagcttcttctttttggcgaatccgcgagggt"

  • If I use pxconsq the output consensus like this:

pxconsq -s genes_mafft/g353_alignment/g4471_mafft.fasta

">consensus
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGSRAVYRTNYTNGGYMTNGARGYWGTYHYRNNNSCHRAYGGNRHNVTNVTBGAYATKNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNKSHHTKRMHNTGGYHHTRHVHHAHHTRGANGGHSYNMRNRAYCCHBTRNNNBYHKYRNNNNNNNNNAAHTTYTATRTYBTRATYGAGACVACNNNNRGYRVHGANNNNNNNWCNHHTGAYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"

  • While the 50% Major Rule consensus sequence generated by Geneious looks like this:

cat g353_consensus/g4471.fasta

">g4471_mafft_consensus_sequence
AATGTRATTCAGGATGAAGABAGACTGBHDRCTGCAAATACRGATTGGATGCGTAAATACAAAGGCTCAAGTAAGCTTYTGCTCCAACCTAGGAGCACTGARGAGGTTTCTCAGATTCTTAAATACTGTAATTCYAGACGCTTGGCTGTTGTTGTATGTGAAGCAGGATGCATATTGGAAAATYTGGTTTCTTTCCTGGAYAATSAAGGATTTATTATGCCACTDGACTTRGGTGCAAAAGGAAGCTGCCAAATTGGTGGAAATGTTTCAACTAATGCTGGTGGTTTGCGCYTTGTCCGTTATGGATCACTTCATGGAAATGTACTTGGTCTTGAAGCTGTTTTAGCAAATGGTACTGTGCTTGACATGCTTGGGACTTTACGYAAAGATAATACTGGRTATGACTTGAAGCATTTGTTTATAGGAAGTGAAGGATCMTTGGGAATTGTMACTAAGGTTTCMATACTTACYCCTCCRAAGCTATCTTCAGTWAATSTWGCTTTTCTTGCWTGTAAAGATTATTTCAGCTGCCAGAAACTTCTATTGGAAGCCAAGAGGAARCTTGGRGAGATTCTMTCTGCATTTGAATTTTTGGATARCCADTCAATGGATYTGGTTCTGAATCATTTAGAAGGTGTTCGRAATCCATTACCTCCMTCAMTGCACAACTTTTATGTTCTGATTGAGACAACAGGCAGTGATGAATCTTATGACAAAGAGAAGCTTGAAGCYTTCCTACTTCGCTCAATGGAAGGTGGTTTGATATCTGATGGTGTTATTGCACAAGACATAAACCAAGCATCATCATTTTGGCGWATWCGTGAGGGT"

Please let me know if you have questions.

Thanks!

Miao

@josephwb
Copy link
Member

Kewl. Thanks for putting this together.

@josephwb
Copy link
Member

Hey @Cactusolo can you send me the complete file so I can match these expectations exactly? phylo dot jwb at gmail dot com

@Cactusolo
Copy link
Author

@josephwb done.

@josephwb josephwb self-assigned this Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants