Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtars uniwig trying to create coordinates larger than chr #74

Open
ljmills opened this issue Jan 27, 2025 · 6 comments
Open

gtars uniwig trying to create coordinates larger than chr #74

ljmills opened this issue Jan 27, 2025 · 6 comments
Labels
bug Something isn't working uniwig

Comments

@ljmills
Copy link

ljmills commented Jan 27, 2025

I am using gtars uniwig as part of the PEPATAC pipeline line and I am consistently getting this error across all of my ATAC-seq samples. It seems to happen on chromosomes that have alignment near the end of the chromosomes. I have checked the alignments and I don't have reads aligned off the ends of these chromosomes so I think it is an issue with gtars.

Produce signal tracks (01-26 17:22:21) elapsed: 1118.0 TIME

Target to produce: /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_exact_shift.bw,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_smooth_shift.bw,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_shift.bed

gtars uniwig -f /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/ljmills/shared/bin/pepatac/refgeneFolder/alias/canFam4Y/fasta/default/canFam4Y.chrom.sizes -m 0 -s 1 -t bam -y bw -p 16 -u shift -l /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_exact --bamscale 100227120.0 (3214339)
Begin bam processing workflow...

Merging all bigwig files...
Error opening file: No such file or directory (os error 2)
Error opening file: No such file or directory (os error 2)
Error opening file: No such file or directory (os error 2)
Error writing to BigWig file: Invalid bed graph: `50000` is greater than the chromosome (chrUn_JAAHUQ010000633v1) length (28993)
FINISHED

Command completed. Elapsed time: 0:00:49. Running peak memory: 5.237GB.
PID: 3214339; Command: gtars; Return code: 0; Memory used: 0.137GB

gtars uniwig -f /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/ljmills/shared/bin/pepatac/refgeneFolder/alias/canFam4Y/fasta/default/canFam4Y.chrom.sizes -m 25 -s 1 -t bam -y bw -p 16 -u shift -l /home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_smooth --bamscale 100227120.0 (3245409)

Begin bam processing workflow...
Merging all bigwig files...
Error opening file: No such file or directory (os error 2)
Error opening file: No such file or directory (os error 2)
Error opening file: No such file or directory (os error 2)
Error writing to BigWig file: Invalid bed graph: `123600000` is greater than the chromosome (chr1) length (123556469)
FINISHED

Command completed. Elapsed time: 0:00:40. Running peak memory: 5.237GB.
PID: 3245409; Command: gtars; Return code: 0; Memory used: 0.012GB

@donaldcampbelljr donaldcampbelljr added uniwig bug Something isn't working labels Jan 27, 2025
@donaldcampbelljr
Copy link
Member

Thank you @ljmills for posting and for sending some example files.

I am unable to get the exact same issue as above. I am seeing, what I believe to be, a separate issue in the problem files you sent over.

For testChr1.bam where chrom.sizes for chr1 are such:
chr1 123556469

Emulating gtars command from PEPATAC:
./target/release/gtars uniwig --file "/home/drc/Downloads/gtars_74_issue/input/testchr1.bam" -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y.chrom.sizes -m 25 -s 1 -t bam -y bw -p 6 -u shift --bamscale 100227120.0 -l /home/drc/Downloads/gtars_74_issue/output/test

I do not get a completion. The program simply hangs during bigwig creation.

I'm seeing the intermediate lines being written as such:

chr1	0	123554979	0
chr1	123554979	123555028	0.00000000997734
chr1	123555028	123555030	0.00000001995468
chr1	123555030	123555051	0.00000000997734
chr1	123555051	123555079	0.00000001995468
chr1	123555079	123555102	0.00000000997734
chr1	123555102	123555120	0
chr1	123555120	123555122	0.00000000997734
chr1	123555122	123555163	0.00000001995468
chr1	123555163	123555171	0.000000029932018
chr1	123555171	123555173	0.00000001995468
chr1	123555173	123555214	0.00000000997734
chr1	123555214	123555291	0
chr1	123555291	123555342	0.00000000997734
chr1	123555342	123555394	0
chr1	123555394	123555428	0.00000001995468
chr1	123555428	123555432	0.000000029932018
chr1	123555432	123555445	0.00000003990936
chr1	123555445	123555479	0.00000001995468
chr1	123555479	123555483	0.00000000997734
chr1	123555483	123555529	0
chr1	123555529	123555549	0.00000000997734
chr1	123555549	123555550	0.00000001995468
chr1	123555550	123555573	0.000000029932018
chr1	123555573	123555580	0.00000003990936
chr1	123555580	123555600	0.000000029932018
chr1	123555600	123555601	0.00000001995468
chr1	123555601	123555675	0.00000000997734
chr1	123555675	123555700	0
chr1	123555700	123555727	0.00000000997734
chr1	123555727	123555751	0.00000001995468
chr1	123555751	123555778	0.00000000997734
chr1	123555778	123555824	0
chr1	123555824	123555853	0.00000000997734
chr1	123555853	123555875	0.00000001995468
chr1	123555875	123555904	0.00000000997734
chr1	123555904	123555996	0
chr1	123555996	123556047	0.00000000997734
chr1	123556047	123556111	0
chr1	123556111	123556135	0.00000000997734
chr1	123556135	123556162	0.00000001995468
chr1	123556162	123556186	0.00000000997734
chr1	123556186	123556341	0
chr1	123556341	123556392	0.00000000997734
chr1	123556392	123556428	0
chr1	123556428	123556429	0.00000001995468

The counts are not extending past the chrom size in this particular example (123556469).

However, I noticed that the 4th column's numbers are very small and that if I simply change the --bamscale number to something smaller, the process will no longer hang and will complete just fine.

Again, I'm having difficulties reproducing the exact same error.

If the files are hanging during bigwig creation, a temp solution could be to manipulate the pepatac.py lines (this occurs in a few places! - 4 total) :
https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1551

If one lowers the scaling factor by a few orders of magnitude, it will allow the bigwig conversion to complete.

@ljmills
Copy link
Author

ljmills commented Jan 28, 2025 via email

@donaldcampbelljr
Copy link
Member

Excellent. Thank you for providing these.

I was able to reproduce the issue. It is related to the smoothing argument -m 25 in the pipeline. It seems as though anything greater than 6 will cause upper bound of the smoothing to be beyond the chromosome size (for Chr1 in the above example).

This may be a regression as I thought we had fixed this in the past. I will need to investigate and issue a new gtars release, etc.

Some suggested solutions for the near future:

  1. Wait for next gtars release. This will probably be within the next week or so.

  2. Regarding the PEPATAC pipeline (which I also maintain), I believe the produced bigwig files are not used downstream, only the BED file. Therefore, a temporary solution would be to change the smoothing factor in your local PEPATAC for the produced bigwig:
    https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1541

  3. You can pull and use the previous version of PEPATAC: https://github.com/databio/pepatac/releases/tag/v0.11.3 It does not use gtars uniwig and instead relies on a python implementation (it is slower).

@ljmills
Copy link
Author

ljmills commented Jan 28, 2025 via email

@donaldcampbelljr
Copy link
Member

Some troubleshooting notes:

I confirmed that the bedGraph lines being reported are not extended beyond the chromosome during uniwig counting, regardless of the smooth size (-m 25 in the below example).

chr1	123556341	123556392	0.00000000997734

chr1	123556392	123556428	0

chr1	123556428	123556429	0.00000001995468

Merging all bigwig files...
Error writing to BigWig file: Invalid bed graph: `123600000` is greater than the chromosome (chr1) length (123556469)
FINISHED

But we still get an error when writing the merged bigwig file.

Checking the intermediate bigwig file (before merging), I used bigTools to convert it to a bedGraph and see that the bedGraph also does not contain regions beyond the chromsize:

chr1	123556341	123556392	9.97734e-9
chr1	123556392	123556428	0.0
chr1	123556428	123556429	1.995468e-8

The final step before it errors is is the merging of any bigwigs:

let threshold = 0.0; // default
let adjust = Some(0.0); // default
let clip = Some(100000000.0); // arbitrary but large because we don't want to clip
let (iter, chrom_map) = get_merged_vals(bigwigs, 10, threshold, adjust, clip)?;
let outb = BigWigWrite::create_file(combined_bw_file_name, chrom_map)?;
let runtime = if num_threads == 1 {
runtime::Builder::new_current_thread().build().unwrap()
} else {
runtime::Builder::new_multi_thread()
.worker_threads(num_threads as usize)
.build()
.unwrap()
};
let all_values = ChromGroupReadImpl {
iter: Box::new(iter),
};
//println!("WRITING COMBINED BW FILE: {}", combined_bw_file_name.clone());
// outb.write(all_values, runtime)?;
match outb.write(all_values, runtime) {
Ok(_) => {
eprintln!("Successfully wrote file: {}", final_file_path);
}
Err(err) => {
eprintln!("Error writing to BigWig file: {}", err);
// Delete the partially written file
std::fs::remove_file(final_file_path).unwrap_or_else(|e| {
eprintln!("Error deleting file: {}", e);
});
}
}

It is not clear why this step would be affected by the smooth_size parameter nor why it could change the merged bigwig in such a way that would cause a region to extend over the chrom_sizes.

@donaldcampbelljr
Copy link
Member

Ok, I believe I've figured out where this is happening. It is during the merging step when all of the individual chromosome bigwigs are merged. Due to a combination of region length and associated values, the merging step can produce a value that extends beyond the chromosome length and then fail. I will submit a bug for discussion in the Bigtools.

Some notes for reference:

Individually produced bw file and then manually attempt to merge using bigtools bigwigmerge

large bamscale, smooth = 25, fails bigwigmerge
drc@databio:~/GITHUB/gtars/gtars$ ./target/release/gtars uniwig -f /home/drc/Downloads/gtars_74_issue/input/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y_2chrs.chrom.sizes -m 25 -s 1 -t bam -y bw -p 6 -u shift --bamscale 100227120.0 -l /home/drc/Downloads/gtars_74_issue/output/test
Begin bam processing workflow...
Merging all bigwig files...
max_val: 0.000000798187159034569
Here are starts ends values 123556200  123556341  0
Here are starts ends values 123556341  123556392  0.00000000997734
Here are starts ends values 123556392  123556428  0
Here are starts ends values 123556428  123556429  0.00000001995468
GET MERGED VALS
FINISHED
small bamscale, smooth = 25, passes bigwigmerge
drc@databio:~/GITHUB/gtars/gtars$ ./target/release/gtars uniwig -f /home/drc/Downloads/gtars_74_issue/input/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y_2chrs.chrom.sizes -m 25 -s 1 -t bam -y bw -p 6 -u shift --bamscale 1.0 -l /home/drc/Downloads/gtars_74_issue/output/test
Begin bam processing workflow...
Merging all bigwig files...
max_val: 80
Here are starts ends values 123556200  123556341  0
Here are starts ends values 123556341  123556392  1
Here are starts ends values 123556392  123556428  0
Here are starts ends values 123556428  123556429  2
GET MERGED VALS
FINISHED
large bamscale, smooth = 5, passes bigwigmerge
drc@databio:~/GITHUB/gtars/gtars$ ./target/release/gtars uniwig -f /home/drc/Downloads/gtars_74_issue/input/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y_2chrs.chrom.sizes -m 5 -s 1 -t bam -y bw -p 6 -u shift --bamscale 100227120.0 -l /home/drc/Downloads/gtars_74_issue/output/test
Begin bam processing workflow...
Merging all bigwig files...
max_val: 0.0000003192748749825114
Here are starts ends values 123556200  123556361  0
Here are starts ends values 123556361  123556372  0.00000000997734
Here are starts ends values 123556372  123556448  0
Here are starts ends values 123556448  123556449  0.00000001995468
Here are starts ends values 123556449  123556459  0.000000029932018
Here are starts ends values 123556459  123556460  0.00000000997734
GET MERGED VALS
FINISHED

Checking output after merge call (fn get_merged_vals):

large bamscale, smooth = 25, fails bigwigmerge
drc@databio:~/GITHUB/gtars/gtars$ ./target/release/gtars uniwig -f /home/drc/Downloads/gtars_74_issue/input/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y_2chrs.chrom.sizes -m 25 -s 1 -t bam -y bw -p 6 -u shift --bamscale 100227120.0 -l /home/drc/Downloads/gtars_74_issue/output/test
Begin bam processing workflow...
Merging all bigwig files...
GET MERGED VALS
here is res: ChromInfo { name: "chr1", length: 123556469, id: 0 }
here is last value: Value { start: 123550000, end: 123600000, value: 9.97734e-9 }
FINISHED
small bamscale, smooth = 25, passes bigwigmerge
drc@databio:~/GITHUB/gtars/gtars$ ./target/release/gtars uniwig -f /home/drc/Downloads/gtars_74_issue/input/UMGC-Mod-084-1s-MN24-d1-D_S44_sort_dedup.bam -c /home/drc/Downloads/gtars_74_issue/input/canFam4Y_2chrs.chrom.sizes -m 25 -s 1 -t bam -y bw -p 6 -u shift --bamscale 1.0 -l /home/drc/Downloads/gtars_74_issue/output/test
Begin bam processing workflow...
Merging all bigwig files...
GET MERGED VALS
here is res: ChromInfo { name: "chr1", length: 123556469, id: 0 }
here is last value: Value { start: 123556428, end: 123556429, value: 2.0 }
FINISHED

So, depending on the region length and the values, the merging step can cause a start and end to extend beyond the chrom sizes in the provided chrom map.

Happening here:

let (iter, chrom_map) = get_merged_vals(bigwigs, 10, threshold, adjust, clip)?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working uniwig
Projects
None yet
Development

No branches or pull requests

2 participants