train.py error - Expected 2D array, got 1D array instead #24

Rhinogradentia · 2025-03-17T17:51:48Z

Hi,

another question.
The tool was installed via conda on python 3.7.

I have the following error when running train.py:

(plassClass) /PlasClass$ train.py -p plasmids.fasta -c genome.fasta -o train/ -n 25
Starting PlasClass training
Getting reference lengths
Sampling 96 fragments for length 1000
Getting k-mer frequencies
Learning classifier
Saving classifier
Sampling 9 fragments for length 10000
Getting k-mer frequencies
Learning classifier
Saving classifier
Sampling 0 fragments for length 100000
Getting k-mer frequencies
Learning classifier
Traceback (most recent call last):
  File "/home/<user>/miniconda3_new/envs/plassClass/bin/train.py", line 197, in <module>
    main(args)
  File "/home/<user>/miniconda3_new/envs/plassClass/bin/train.py", line 193, in main
    train(plasfile,chromfile,outdir,num_procs,ks,lens)
  File "/home/<user>/miniconda3_new/envs/plassClass/bin/train.py", line 172, in train
    scaler = StandardScaler().fit(data)
  File "/home/<user>/miniconda3_new/envs/plassClass/lib/python3.7/site-packages/sklearn/preprocessing/data.py", line 639, in fit
    return self.partial_fit(X, y)
  File "/home/<user>/miniconda3_new/envs/plassClass/lib/python3.7/site-packages/sklearn/preprocessing/data.py", line 663, in partial_fit
    force_all_finite='allow-nan')
  File "/home/<user>/miniconda3_new/envs/plassClass/lib/python3.7/site-packages/sklearn/utils/validation.py", line 521, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

The fasta files contain ncbi sequences - 4 in the plasmid-file and 7 in the genome file, no empty lines, but only one species.

What might be the reason for this error and what can I do to solve it?

Thank you in advance.
Best,
Nadine

The text was updated successfully, but these errors were encountered:

dpellow · 2025-03-17T18:33:03Z

I believe this happens because none of the sequences are in the 100K nt length bin. Is that correct, are you using full length genomes?

If this is correct I would modify the length bins you are using via the -l parameter.

Rhinogradentia · 2025-03-17T19:42:26Z

Yes, I used the complete sequences. I must have missed it that I should bin them. Where can I find this info? Should I just split the chromosomes in 100k pieces?
Thank you.
Best,
Nadine

dpellow · 2025-03-17T21:18:44Z

No you don't need to bin the sequences, plasclass does that. But one of the bins is empty which is why it is giving an error. I'm not sure why that bin is empty - what are the lengths of the 11 sequences. Are you able to share the fasta file?

Rhinogradentia · 2025-03-17T21:30:36Z

Yes - I can share them - they are public. I've attached them.

sequences.zip

genome:

NC_006077.1 Kluyveromyces lactis mitochondrion, complete genome 40291
NC_006042.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome F complete sequence 2602197
NC_006041.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome E complete sequence 2234072
NC_006040.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome D complete sequence 1715506
NC_006039.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome C complete sequence 1753957
NC_006038.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome B complete sequence 1320834
NC_006037.1 Kluyveromyces lactis strain NRRL Y-1140 chromosome A complete sequence 1062590

plasmids:

M11815.1 Plasmid pGKL1 from killer yeast (K.lactis), complete 8876
X01095.1 Yeast DNA killer plasmid pGKL1 8874
X01096.1 Yeast DNA killer plasmid pGKL2 left terminal region 793
X01097.1 Yeast DNA killer plasmid pGKL2 right terminal region 1317

dpellow · 2025-03-24T14:41:26Z

@Rhinogradentia you have no plasmids that are more than 10K nt long so there are no positive sequences for your classifier to train on at that length. If you are only interested in training on the plasmids in your reference file, you don't need to classify any sequences > 10Kb as plasmids and so you shouldn't train a model for that length bin. You can define the length bins you need using -l.

I'm not sure what use case you are trying to train a model for - the normal use case would use a very large database of training sequences.

Rhinogradentia · 2025-03-24T15:02:40Z

@dpellow, thank you for this clarification. The database your model is trained on does not contain yeasts (or at least I could not find any), which I'm interested in. Therefore, my idea was to train a new model based on a known plasmid. This may be the wrong approach.

dpellow · 2025-03-24T15:31:11Z

you can try it, but I would try to create a larger database for training. Are you just trying to find the specific sequences you listed in your fasta file in a sample?

Rhinogradentia · 2025-03-24T18:13:20Z

At least a very similar one - I already tried other approaches, and there are some contigs which might be plasmids, but I'm not very convinced right now. I will try to build a larger db/set of sequences and try again. Thank you for being so helpful

dpellow · 2025-03-24T18:18:59Z

ok, you can try with -l 1000,5000,10000 and see if works and before trying a bigger database

dpellow · 2025-03-24T18:19:20Z

also, did you try to just use plasclass without training it? Did it work?

Rhinogradentia · 2025-03-24T18:31:06Z

yes, there were some results - but they couldn't be circularized - but then I'm not even sure if my plasmid is circular (they can be linear in yeasts). I will try what you suggested. I'm really thankful for your support. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train.py error - Expected 2D array, got 1D array instead #24

train.py error - Expected 2D array, got 1D array instead #24

Rhinogradentia commented Mar 17, 2025 •

edited

Loading

dpellow commented Mar 17, 2025

Rhinogradentia commented Mar 17, 2025

dpellow commented Mar 17, 2025

Rhinogradentia commented Mar 17, 2025 •

edited

Loading

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025

dpellow commented Mar 24, 2025 •

edited

Loading

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025 •

edited

Loading

train.py error - Expected 2D array, got 1D array instead #24

train.py error - Expected 2D array, got 1D array instead #24

Comments

Rhinogradentia commented Mar 17, 2025 • edited Loading

dpellow commented Mar 17, 2025

Rhinogradentia commented Mar 17, 2025

dpellow commented Mar 17, 2025

Rhinogradentia commented Mar 17, 2025 • edited Loading

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025

dpellow commented Mar 24, 2025 • edited Loading

dpellow commented Mar 24, 2025

Rhinogradentia commented Mar 24, 2025 • edited Loading

Rhinogradentia commented Mar 17, 2025 •

edited

Loading

Rhinogradentia commented Mar 17, 2025 •

edited

Loading

dpellow commented Mar 24, 2025 •

edited

Loading

Rhinogradentia commented Mar 24, 2025 •

edited

Loading