-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gtars uniwig trying to create coordinates larger than chr #74
Comments
Thank you @ljmills for posting and for sending some example files. I am unable to get the exact same issue as above. I am seeing, what I believe to be, a separate issue in the problem files you sent over. For Emulating gtars command from PEPATAC: I do not get a completion. The program simply hangs during bigwig creation. I'm seeing the intermediate lines being written as such:
The counts are not extending past the chrom size in this particular example (123556469). However, I noticed that the 4th column's numbers are very small and that if I simply change the Again, I'm having difficulties reproducing the exact same error. If the files are hanging during bigwig creation, a temp solution could be to manipulate the If one lowers the scaling factor by a few orders of magnitude, it will allow the bigwig conversion to complete. |
The PEPATAC pipeline is setting the bascale number based on the data so I
don't want to change it manually. You should be able to download the full
bam file and bai using the links below. If you still can't recreate
the error then I will ask the PEPATAC people about the bamscale value.
Also I am not getting a hang on my end most of the chromosomes process fine
only a couple for each sample will fail and they aren't consistent across
the sample. Chr 1 failed in this sample but not all samples.
Thanks,
-Lauren
full bam file:
https://drive.google.com/file/d/1_OiOP3ADLcX7LpTWGI-UJfxUEAEBd7_h/view?usp=sharing
full bai file:
https://drive.google.com/file/d/1BILsI5ltkY1XDGGlpRjGiO4k20PGkHq5/view?usp=sharing
…On Mon, Jan 27, 2025 at 1:12 PM Donald Campbell ***@***.***> wrote:
Thank you @ljmills <https://github.com/ljmills> for posting and for
sending some example files.
I am unable to get the exact same issue as above. I am seeing, what I
believe to be, a separate issue in the problem files you sent over.
For testChr1.bam where chrom.sizes for chr1 are such:
chr1 123556469
Emulating gtars command from PEPATAC:
./target/release/gtars uniwig --file
"/home/drc/Downloads/gtars_74_issue/input/testchr1.bam" -c
/home/drc/Downloads/gtars_74_issue/input/canFam4Y.chrom.sizes -m 25 -s 1 -t
bam -y bw -p 6 -u shift --bamscale 100227120.0 -l
/home/drc/Downloads/gtars_74_issue/output/test
I do not get a completion. *The program simply hangs during bigwig
creation.*
I'm seeing the intermediate lines being written as such:
chr1 0 123554979 0
chr1 123554979 123555028 0.00000000997734
chr1 123555028 123555030 0.00000001995468
chr1 123555030 123555051 0.00000000997734
chr1 123555051 123555079 0.00000001995468
chr1 123555079 123555102 0.00000000997734
chr1 123555102 123555120 0
chr1 123555120 123555122 0.00000000997734
chr1 123555122 123555163 0.00000001995468
chr1 123555163 123555171 0.000000029932018
chr1 123555171 123555173 0.00000001995468
chr1 123555173 123555214 0.00000000997734
chr1 123555214 123555291 0
chr1 123555291 123555342 0.00000000997734
chr1 123555342 123555394 0
chr1 123555394 123555428 0.00000001995468
chr1 123555428 123555432 0.000000029932018
chr1 123555432 123555445 0.00000003990936
chr1 123555445 123555479 0.00000001995468
chr1 123555479 123555483 0.00000000997734
chr1 123555483 123555529 0
chr1 123555529 123555549 0.00000000997734
chr1 123555549 123555550 0.00000001995468
chr1 123555550 123555573 0.000000029932018
chr1 123555573 123555580 0.00000003990936
chr1 123555580 123555600 0.000000029932018
chr1 123555600 123555601 0.00000001995468
chr1 123555601 123555675 0.00000000997734
chr1 123555675 123555700 0
chr1 123555700 123555727 0.00000000997734
chr1 123555727 123555751 0.00000001995468
chr1 123555751 123555778 0.00000000997734
chr1 123555778 123555824 0
chr1 123555824 123555853 0.00000000997734
chr1 123555853 123555875 0.00000001995468
chr1 123555875 123555904 0.00000000997734
chr1 123555904 123555996 0
chr1 123555996 123556047 0.00000000997734
chr1 123556047 123556111 0
chr1 123556111 123556135 0.00000000997734
chr1 123556135 123556162 0.00000001995468
chr1 123556162 123556186 0.00000000997734
chr1 123556186 123556341 0
chr1 123556341 123556392 0.00000000997734
chr1 123556392 123556428 0
chr1 123556428 123556429 0.00000001995468
The counts are not extending past the chrom size in this particular
example (123556469).
However, I noticed that the 4th column's numbers are very small and that
if I simply change the --bamscale number to something smaller, the
process will no longer hang and will complete just fine.
Again, I'm having difficulties reproducing the exact same error.
If the files are hanging during bigwig creation, a temp solution could be
to manipulate the pepatac.py lines (this occurs in a few places! - 4
total) :
https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1551
If one lowers the scaling factor by a few orders of magnitude, it will
allow the bigwig conversion to complete.
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUCYOKFTIKJNTYW2XGRD2M2AIJAVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJWGY3TMNJSGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Excellent. Thank you for providing these. I was able to reproduce the issue. It is related to the smoothing argument This may be a regression as I thought we had fixed this in the past. I will need to investigate and issue a new gtars release, etc. Some suggested solutions for the near future:
|
Thanks Donald. I think I will edit my pepatac.py for now and I look forward
to the update to gtars/pepatac.
…-L
On Tue, Jan 28, 2025 at 11:39 AM Donald Campbell ***@***.***> wrote:
Excellent. Thank you for providing these.
I was able to reproduce the issue. It is related to the smoothing argument -m
25 in the pipeline. It seems as though anything greater than 6 will cause
upper bound of the smoothing to be beyond the chromosome size (for Chr1 in
the above example).
This may be a regression as I thought we had fixed this in the past. I
will need to investigate and issue a new gtars release, etc.
Some suggested solutions for the near future:
1.
Wait for next gtars release. This will probably be within the next
week or so.
2.
Regarding the PEPATAC pipeline (which I also maintain), I believe the
produced bigwig files are not used downstream, *only* the BED file.
Therefore, a temporary solution would be to change the smoothing factor in
your local PEPATAC for the produced bigwig:
https://github.com/databio/pepatac/blob/0d1aad08910aa3e7be1f410071a5c4e595884272/pipelines/pepatac.py#L1541
3.
You can pull and use the previous version of PEPATAC:
https://github.com/databio/pepatac/releases/tag/v0.11.3 It does not
use gtars uniwig and instead relies on a python implementation (it is
slower).
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF3TUC6PMKVCE6ZDSDP2BKD2M66E7AVCNFSM6AAAAABV6IWVN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJZGY3DENBXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Some troubleshooting notes:I confirmed that the bedGraph lines being reported are not extended beyond the chromosome during uniwig counting, regardless of the smooth size (-m 25 in the below example).
But we still get an error when writing the merged bigwig file. Checking the intermediate bigwig file (before merging), I used bigTools to convert it to a bedGraph and see that the bedGraph also does not contain regions beyond the chromsize:
The final step before it errors is is the merging of any bigwigs: Lines 912 to 944 in 5ce8434
It is not clear why this step would be affected by the smooth_size parameter nor why it could change the merged bigwig in such a way that would cause a region to extend over the chrom_sizes. |
Ok, I believe I've figured out where this is happening. It is during the merging step when all of the individual chromosome bigwigs are merged. Due to a combination of region length and associated values, the merging step can produce a value that extends beyond the chromosome length and then fail. I will submit a bug for discussion in the Bigtools. Some notes for reference:Individually produced bw file and then manually attempt to merge using bigtools bigwigmerge large bamscale, smooth = 25, fails bigwigmerge
small bamscale, smooth = 25, passes bigwigmerge
large bamscale, smooth = 5, passes bigwigmerge
Checking output after merge call ( large bamscale, smooth = 25, fails bigwigmerge
small bamscale, smooth = 25, passes bigwigmerge
So, depending on the region length and the values, the merging step can cause a start and end to extend beyond the chrom sizes in the provided chrom map. Happening here: Line 915 in 5ce8434
|
I am using gtars uniwig as part of the PEPATAC pipeline line and I am consistently getting this error across all of my ATAC-seq samples. It seems to happen on chromosomes that have alignment near the end of the chromosomes. I have checked the alignments and I don't have reads aligned off the ends of these chromosomes so I think it is an issue with gtars.
Produce signal tracks (01-26 17:22:21) elapsed: 1118.0 TIME
Target to produce:
/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_exact_shift.bw
,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y/UMGC-Mod-084-1s-MN24-d1-D_S44_smooth_shift.bw
,/home/ljmills/ljmills/Modiano/ATAC/pepatac_canFam4Y/project84/results_pipeline/UMGC-Mod-084-1s-MN24-d1-D_S44/aligned_canFam4Y_exact/UMGC-Mod-084-1s-MN24-d1-D_S44_shift.bed
Command completed. Elapsed time: 0:00:49. Running peak memory: 5.237GB.
PID: 3214339; Command: gtars; Return code: 0; Memory used: 0.137GB
Command completed. Elapsed time: 0:00:40. Running peak memory: 5.237GB.
PID: 3245409; Command: gtars; Return code: 0; Memory used: 0.012GB
The text was updated successfully, but these errors were encountered: