-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train.py error - Expected 2D array, got 1D array instead #24
Comments
I believe this happens because none of the sequences are in the 100K nt length bin. Is that correct, are you using full length genomes? If this is correct I would modify the length bins you are using via the |
Yes, I used the complete sequences. I must have missed it that I should bin them. Where can I find this info? Should I just split the chromosomes in 100k pieces? |
No you don't need to bin the sequences, plasclass does that. But one of the bins is empty which is why it is giving an error. I'm not sure why that bin is empty - what are the lengths of the 11 sequences. Are you able to share the fasta file? |
Yes - I can share them - they are public. I've attached them. genome:
plasmids:
|
@Rhinogradentia you have no plasmids that are more than 10K nt long so there are no positive sequences for your classifier to train on at that length. If you are only interested in training on the plasmids in your reference file, you don't need to classify any sequences > 10Kb as plasmids and so you shouldn't train a model for that length bin. You can define the length bins you need using -l. I'm not sure what use case you are trying to train a model for - the normal use case would use a very large database of training sequences. |
@dpellow, thank you for this clarification. The database your model is trained on does not contain yeasts (or at least I could not find any), which I'm interested in. Therefore, my idea was to train a new model based on a known plasmid. This may be the wrong approach. |
you can try it, but I would try to create a larger database for training. Are you just trying to find the specific sequences you listed in your fasta file in a sample? |
At least a very similar one - I already tried other approaches, and there are some contigs which might be plasmids, but I'm not very convinced right now. I will try to build a larger db/set of sequences and try again. Thank you for being so helpful |
ok, you can try with -l 1000,5000,10000 and see if works and before trying a bigger database |
also, did you try to just use plasclass without training it? Did it work? |
yes, there were some results - but they couldn't be circularized - but then I'm not even sure if my plasmid is circular (they can be linear in yeasts). I will try what you suggested. I'm really thankful for your support. Thanks! |
Hi,
another question.
The tool was installed via conda on python 3.7.
I have the following error when running train.py:
The fasta files contain ncbi sequences - 4 in the plasmid-file and 7 in the genome file, no empty lines, but only one species.
What might be the reason for this error and what can I do to solve it?
Thank you in advance.
Best,
Nadine
The text was updated successfully, but these errors were encountered: