Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDBSCAN error for large dataset #20

Open
sandraTriebel opened this issue Oct 17, 2022 · 1 comment
Open

HDBSCAN error for large dataset #20

sandraTriebel opened this issue Oct 17, 2022 · 1 comment

Comments

@sandraTriebel
Copy link
Member

sandraTriebel commented Oct 17, 2022

Got this error message while running ViralClust with SARS-CoV-2 alpha genomes (152,307 non-redundant seqs).

Traceback (most recent call last): File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 663, in <module> perform_clustering() File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 604, in perform_clustering virusClusterer.determine_profile(multiPool) File "/home/nu76fet/programs/viralclust/bin/hdbscan_virus.py", line 267, in determine_profile allProfiles = p.map(self.profile, self.d_sequences.items()) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/nu76fet/programs/viralclust/conda/hdbscan-9fec0a1dfe235db7d7c78f1a0bba3ac9/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

@klamkiew
Copy link
Collaborator

klamkiew commented Apr 7, 2023

I am not entirely sure whether this related to #21 at all, but I have encountered issues with large data sets (>100k non-redundant genomes) as well. Yours looks like a weird multi-processing issue to be honest, so I think this is a whole other topic :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants