Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find-motifs --contig failed to parse any bedmethyl records #349

Open
Ge0rges opened this issue Jan 17, 2025 · 5 comments
Open

find-motifs --contig failed to parse any bedmethyl records #349

Ge0rges opened this issue Jan 17, 2025 · 5 comments

Comments

@Ge0rges
Copy link

Ge0rges commented Jan 17, 2025

Hey @ArtRand I'm having some issue running finds-motifs after merge bed. I get:

_My script: Finding enriched motifs for contig contig_100009 in metagenome_assembly_
> loading references from "mags//metagenome_assembly.fna"  
> loaded 106135 sequence(s)                                    
> using tabix/bgzf reader                                 
> Error! failed to parse any bedmethyl records

When I look at the merged BED that is input to find-motifs I can do grep "contig_100009" all_samples.bed I get many lines of output such as:

contig_100009   5411    5412    a       1       +       5411    5412   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5413    5414    a       1       +       5413    5414   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5414    5415    a       1       +       5414    5415   255,0,0  1       100.00  1       0       0       0       0       0      0
contig_100009   5416    5417    a       1       +       5416    5417   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5417    5418    a       1       +       5417    5418   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5418    5419    a       1       +       5418    5419   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5419    5420    a       1       +       5419    5420   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5420    5421    m       1       +       5420    5421   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5420    5421    21839   1       +       5420    5421   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5421    5422    m       1       +       5421    5422   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5421    5422    21839   1       +       5421    5422   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5422    5423    a       1       +       5422    5423   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5423    5424    a       1       +       5423    5424   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5424    5425    m       1       +       5424    5425   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5424    5425    21839   1       +       5424    5425   255,0,0  1       0.00    0       1       0       0       0       0      contig_100009   5426    5427    a       1       +       5426    5427   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5427    5428    a       1       +       5427    5428   255,0,0  1       0.00    0       1       0       0       0       0      0

The script is doing:

modkit bedmethyl merge -o "$merged_bed" -g "$outdir/sizes.tsv" "$outdir"/*.bed.gz --threads "$num_threads" --io-threads 10 --log-filepath $merged_bed.log
bgzip -k --threads "$num_threads" "$merged_bed"
tabix -p bed "$merged_bed.gz"

then

  modkit find-motifs \
--min-coverage 5 \
-i "$merged_bed.gz" \
 -r "$genome" \
--contig "$contig" \
-o "$contig_motif_tsv" \
--log-filepath $contig_motif_tsv.log
--threads 20 || echo "Find-motifs failed for contig $contig."

In sizes.tsv if I do grep "contig_100009 sizes.tsv I get: contig_100009 5531. Finally, just as a sanity check I did find >contig_100009 in the metagenome_assembly.fna file.

Here's the log from merge bed, I removed a lot of similar contig not present statements and made sure my contig wasn't in there:

[src/logging.rs::60][2025-01-16 14:26:28][DEBUG] command line: modkit bedmethyl merge -o methylation_data//metagenome_assembly/all_samples.bed -g methylation_data//metagenome_assembly/sizes.tsv methylation_data//metagenome_assembly/barcode01.bed.gz methylation_data//metagenome_assembly/barcode02.bed.gz methylation_data//metagenome_assembly/barcode03.bed.gz methylation_data//metagenome_assembly/barcode05.bed.gz methylation_data//metagenome_assembly/barcode06.bed.gz methylation_data//metagenome_assembly/barcode07.bed.gz methylation_data//metagenome_assembly/barcode08.bed.gz methylation_data//metagenome_assembly/barcode09.bed.gz methylation_data//metagenome_assembly/barcode10.bed.gz methylation_data//metagenome_assembly/barcode11.bed.gz methylation_data//metagenome_assembly/barcode12.bed.gz methylation_data//metagenome_assembly/barcode13.bed.gz methylation_data//metagenome_assembly/barcode14.bed.gz --threads 20 --io-threads 10 --log-filepath methylation_data//metagenome_assembly/all_samples.bed.log
[src/bedmethyl_util/subcommand.rs::239][2025-01-16 14:26:30][INFO] Collected 104643 contigs from 13 readers
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10005 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10006 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100064 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_1001 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100162 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10040 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10058 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100660 is not present in tabix headers and will be skipped
.
.
.
.
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9936 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9946 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99490 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9957 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99635 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99794 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99911 is not present in tabix headers and will be skipped
[src/command_utils.rs::295][2025-01-16 14:26:30][INFO] calculated chunk size: 30, interval size 100000, processing 3000000 positions concurrently
[src/interval_chunks.rs::512][2025-01-16 14:26:30][DEBUG] there are 104643 contig(s) to work on (104643 parts)
[src/interval_chunks.rs::547][2025-01-16 20:15:55][DEBUG] no more records to process

The log from find-motifs:

[src/logging.rs::60][2025-01-16 21:26:33][DEBUG] command line: modkit find-motifs --min-coverage 5 -i methylation_data//metagenome_assembly/all_samples.bed.g$
[src/find_motifs/mod.rs::971][2025-01-16 21:26:33][INFO] loading references from "mags//metagenome_assembly.fna"
[src/find_motifs/mod.rs::1002][2025-01-16 21:26:40][INFO] loaded 106135 sequence(s)
[src/find_motifs/mod.rs::1132][2025-01-16 21:26:40][INFO] using tabix/bgzf reader
@Ge0rges
Copy link
Author

Ge0rges commented Jan 17, 2025

Looking at this after posting it, could it be this is just because the coverage is too low? Does that track with the error being printed? I read it as there's any issue with the bedmethyl rather than no data was found (and the non-normal exit strengthens that interpretation for me). I want to assume other contigs have very low coverage but I've only gone through ~70 out of 101K.

@ArtRand ArtRand added the needs-attention Requires follow up from developers label Jan 23, 2025
@Ge0rges
Copy link
Author

Ge0rges commented Jan 28, 2025

@ArtRand would it be useful for me to share this with you, do you happen to have an FTP server set up by any chance? I can try to truncate to a subset of contigs otherwise.

@ArtRand
Copy link
Contributor

ArtRand commented Jan 31, 2025

@Ge0rges On my radar. I'll look asap - sorry about the delay.

@ArtRand
Copy link
Contributor

ArtRand commented Jan 31, 2025

For others looking at this issue.

Looking at this after posting it, could it be this is just because the coverage is too low? Does that track with the error being printed?

Correct. If there aren't any bedMethyl records with enough coverage you'll see this error.

@Ge0rges
Copy link
Author

Ge0rges commented Feb 1, 2025

Got it, I think that's likely the issue. Any chance we could have that output a different error message when coverage is the restriction filtering out bases? Rather than an error reading the file.

@ArtRand ArtRand removed the needs-attention Requires follow up from developers label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants