Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 10. Mapping Samples - excessive execution time? #909

Closed
MicroSeq opened this issue Nov 21, 2024 · 8 comments
Closed

Step 10. Mapping Samples - excessive execution time? #909

MicroSeq opened this issue Nov 21, 2024 · 8 comments

Comments

@MicroSeq
Copy link

MicroSeq commented Nov 21, 2024

Hello,

I executed the SqueezeMeta 1.70beta8 workflow on 98 MAGs using the -extbins flag. It appears the sqm_counter read mapping step is taking an excessively long period of time, about 5 hours or so per paired file of ~20 million reads using 64 threads on a fairly new AMD Epyc HPC cluster. I have 52 samples so this will take an excessive amount of time if this trend continues. Any suggestions on troubleshooting would be appreciated, unless this is expected behaviour?

Using this same short read set with SQM 1.6.3

SqueezeMeta.pl -p SqueezeCoassemblyNF_eukaryotes --euk -taxbinmode "s+c" -m coassembly -b 50 -c 500 -binners "maxbin,metabat2,concoct"

The mapping step took about 10 hours out of ~ 3 days for the entire workflow.

@MicroSeq
Copy link
Author

Related, I had my job bumped on our HPC and it looks like progress is not preserved during this step and so it has restarted at sample 1.

@fpusan
Copy link
Collaborator

fpusan commented Nov 29, 2024

Is that the exact command you used? Note that you need to add -t 64 for SqueezeMeta to actually use the 64 threads.

@MicroSeq
Copy link
Author

MicroSeq commented Nov 29, 2024

Is that the exact command you used? Note that you need to add -t 64 for SqueezeMeta to actually use the 64 threads.

Hey, this is the exact command:

SqueezeMeta.pl -p SqueezeExtBins -m coassembly -extbins /dRepGroupsPB-Final/MAGs
-f /Squeeze -b 25 -s samples.tsv -t $SLURM_CPUS_PER_TASK --restart

The output shows the job being distributed across all the threads, it just seems to take a very long time for each sample. I am at sample 30 of 54 now after 4 days of running on step 10 alone.

@MicroSeq
Copy link
Author

MicroSeq commented Dec 2, 2024

As an update, the workflow did complete but it took about 5 days for Step 10 compared to 10 hours previously when external bins were not provided in v. 1.6.3, all other things would have been the same I believe.

@jtamames
Copy link
Owner

jtamames commented Dec 3, 2024

Hum... this is weird, because providing external bins should not interfere with step 10. External bins are not used there. Any chance this could be a behavior related to your system? For instance lots of I/O load?
Best,
J

@MicroSeq
Copy link
Author

MicroSeq commented Dec 3, 2024

Hum... this is weird, because providing external bins should not interfere with step 10. External bins are not used there. Any chance this could be a behavior related to your system? For instance lots of I/O load? Best, J

It's possible, they have had some issues with the file storage system on our HPC cluster as that caused a write error that previously killed this step a couple days in.

However, isn't the mapping step using the external bins as the contigs for mapping as there is no assembly step?

@fpusan
Copy link
Collaborator

fpusan commented Dec 19, 2024

Yes exactly. Using external bins should have no influence on mapping times, only the number of contigs and the number of reads.
It could be that the file system was slower in the second run, hence the excessive time in sqm_counter, but it's hard to tell. If the filesystem causing the difference between both runs you should be seeing lower CPU usage in the second run than in the first. Not sure if you can check that...

@fpusan
Copy link
Collaborator

fpusan commented Feb 18, 2025

Closing due to lack of activity, but let us know if you got any new insights here

@fpusan fpusan closed this as completed Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants