Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUSCO problem: 100% Missing BUSCOs #139

Open
ghost opened this issue Apr 28, 2020 · 1 comment
Open

BUSCO problem: 100% Missing BUSCOs #139

ghost opened this issue Apr 28, 2020 · 1 comment

Comments

@ghost
Copy link

ghost commented Apr 28, 2020

Dear Developers,

We are trying to perform a BUSCO analysis in our assembly on a CentOS7 computer. While the Quast analysis is running successfully, we get 100% missing BUSCOs, even if we try the assessment on the reference genome which normally is ~95%-98%. The augustus.log displays the following message multiple times:

/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
/home/mgabriel/.quast/augustus3.2.3/bin/augustus: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/mgabriel/.quast/augustus3.
2.3/bin/augustus)
...
...
...

The content of the run_assembly.log from the busco_stats directory, after the end of the assessment is the following:

[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370P9T.faa.2 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370ARO.faa.1 for reading
...
...
...
9 of 181 task(s) completed at 04/28/2020 14:27:25
[hmmsearch]     109 of 181 task(s) completed at 04/28/2020 14:27:26
[hmmsearch]     181 of 181 task(s) completed at 04/28/2020 14:27:26
Results:
C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:303
0 Complete BUSCOs (C)
0 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
303 Missing BUSCOs (M)
303 Total BUSCO groups searched
BUSCO did not find any match. Do not forget to check the file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/augustus.log to exclude a problem regarding Augustus
[bash]  rm: cannot remove '/home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/temp_GCA_007989325-1_vir160_genomic_1730718755': No such file or directory
BUSCO analysis done with WARNING(s). Total running time: 1649.43617296 seconds
Results written in /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/

ADS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370ER5.faa.1 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370AJP.faa.2 for reading
[hmmsearch]     Error: Failed to open sequence file /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/augustus_output/extracted_proteins/EOG09370VTP.faa.1 for reading
...
...
...

Also the run_assembly.log before its final form:

****************** Start a BUSCO 3.0.2 analysis, current time: 04/28/2020 19:12:11 ******************
Configuration loaded from /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/config.ini
Init tools...
Check dependencies...
Check input file...
To reproduce this run: python /opt/quast-quast_5.0.2/quast.py -i /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/quast_corrected_input/GCA_0079
89325_1_vir160_genomic.fna -o GCA_007989325-1_vir160_genomic -l /home/mgabriel/.quast/busco/eukaryota/ -m genome -c 16 -t /home/mgabriel/Downloads/data/d
ro_vir_READS/quast_results/busco_stats/tmp/ -sp fly --augustus_parameters ''''
Mode is: genome
The lineage dataset is: eukaryota_odb9 (eukaryota)
Delete the current result folder and start a new run
Temp directory is /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/
****** Phase 1 of 2, initial predictions ******
****** Step 1/3, current time: 04/28/2020 19:12:13 ******
Create blast database...
[makeblastdb]   Building a new DB, current time: 04/28/2020 19:12:13
[makeblastdb]   New DB name:   /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/tmp/GCA_007989325-1_vir160_genomic_1658005936
[makeblastdb]   New DB title:  /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/quast_corrected_input/GCA_007989325_1_vir160_genomic.fna
[makeblastdb]   Sequence type: Nucleotide
[makeblastdb]   Keep MBits: T
[makeblastdb]   Maximum file size: 1000000000B
[makeblastdb]   Adding sequences from FASTA; added 27 sequences in 2.76254 seconds.
[makeblastdb]   1 of 1 task(s) completed at 04/28/2020 19:12:16
Running tblastn, writing output to /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/blast_output/
tblastn_GCA_007989325-1_vir160_genomic.tsv...
[tblastn]       1 of 1 task(s) completed at 04/28/2020 19:14:48
****** Step 2/3, current time: 04/28/2020 19:14:48 ******
Maximum number of candidate contig per BUSCO limited to: 3
Getting coordinates for candidate regions...
Pre-Augustus scaffold extraction...
Running Augustus prediction using fly as species:
Additional parameters for Augustus are '':
[augustus]      Please find all logs related to Augustus errors here: /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_00798
9325-1_vir160_genomic/augustus_output/augustus.log
[augustus]      135 of 337 task(s) completed at 04/28/2020 19:14:51
[augustus]      337 of 337 task(s) completed at 04/28/2020 19:14:52
Extracting predicted proteins...
****** Step 3/3, current time: 04/28/2020 19:14:57 ******
Running HMMER to confirm orthology of predicted proteins:
Results:
C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:303
0 Complete BUSCOs (C)
0 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
303 Missing BUSCOs (M)
303 Total BUSCO groups searched
****** Phase 2 of 2, predictions using species specific training ******
****** Step 1/3, current time: 04/28/2020 19:14:57 ******
Extracting missing and fragmented buscos from the ancestral_variants file...
Running tblastn, writing output to /home/mgabriel/Downloads/data/dro_vir_READS/quast_results/busco_stats/run_GCA_007989325-1_vir160_genomic/blast_output/
tblastn_GCA_007989325-1_vir160_genomic_missing_and_frag_rerun.tsv...

After some research, I found that others came across the same problem when using BUSCO and the problem is fixed in the new BUSCO version (v4). In Ubuntu, we didn't encounter a similar problem.

Do you have any suggestions on what can we do to fix this? Thank you in advance!

@sixvable
Copy link

I have faced same error. With --debug mode I found the dynamic library libstdc++.so.6 used in augustus are system library(locate in /usr/lib64/libstdc++.so.6), which was old.
I think the code in busco mode should modified to use the conda-install library (locate in `conda/path/envs/envs-name/lib/libstdc++.so.6.*****)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant