-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926
Comments
Hi @EEEdyeah , can you please run it on entire chr6 to see if the issue persists? |
@kishwarshafin Hi, I will try and it's still running. In the meantime, I found that when I reran the same code (chr6:28000000-35000000), part of the previously failed sample ran successfully. This suggests that the same code can produce different results, which makes me question the stability of the previously successful runs? |
@EEEdyeah are you running on a system that pauses the processes? It seems like in your run, call variants was paused and the queue did not receive anything for 180 secs which is why it got killed. Can you try by setting num cpus to 0 from the command line and see if it still gets killed. |
@kishwarshafin Sorry for the late reply. I’m not entirely sure what caused the issue, but I think I’ve found a solution. Running each job on a separate node seems to prevent the error from occurring. |
Hi @kishwarshafin. I am facing the same problem as this issue. Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md: Describe the issue: I am trying to reproduce the tutorial for the RNA-Seq use case from the following link:https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-rnaseq-case-study.md However, I have encountered the same error as the one described in this issue. Setup
singularity exec -B "$(pwd):$(pwd)" /home/carlos_menor/Documents/deepvariant_test/deepvariant_1.8.0.sif run_deepvariant --model_type=WES --customized_model=/home/carlos_menor/Documents/deepvariant_test/rnaseq/model/model.ckpt --ref=/home/carlos_menor/Documents/deepvariant_test/rnaseq/reference/GRCh38_no_alt_analysis_set.fasta --reads=/home/carlos_menor/Documents/deepvariant_test/rnaseq/data/hg005_gm26107.mrna.grch38.bam --output_vcf=/home/carlos_menor/Documents/deepvariant_test/rnaseq/output/HG005.output.vcf.gz --num_shards=$(nproc) --regions=/home/carlos_menor/Documents/deepvariant_test/rnaseq/data/chr20_CDS_3x.bed --make_examples_extra_args="split_skip_reads=true,channel_list='BASE_CHANNELS'" --intermediate_results_dir /home/carlos_menor/Documents/deepvariant_test/rnaseq/output/intermediate_results_dir
Process ForkProcess-1: Does the quick start test work on your system? Any additional context: Furthermore, I had to change the name of the model file "model.ckpt.example_info.json" to "example_info.json", because run_deepvariant did not find this file in the model directory path. I attach the log files for both executions, singularity and docker. |
@CarlosMenFer what system are you working on? It seems like the data processor is not sending and data to the queue for 180 seconds and it's getting killed. Does your system throttle or pause workers while they are running? |
@CarlosMenFer I noticed that there's an error raised a bit before the queue starts going:
It looks like |
Thank you for your answers. @kishwarshafin This are my system specifications:
I have tried both, the docker version and the singularity version (from your repository). I find the same problem when I run the same process in a server. @lucasbrambrink How can I check the checkpoint I trained is being called with the same parameters? I used the model files that I downloaded from this page:
Furthermore, for Deepvariant 1.8.0, y had to rename the model.ckpt.example_info.json to example_info.json, because the software failed and required it. What do you mean by converting it to a saved model? Thank you in advance. Best regards |
You are trying to run v1.8.0 with 1.4.0 model which won't work. We moved framework from slim to keras on 1.6.0 version so the models will not be compatible anymore. Please use 1.4.0 for your use-case for now. |
@kishwarshafin Thank you for your response. For now, I am using the 1.5.0 version. It is compatible with the 1.4.0 model? Or I should change to DeepVariant 1.4.0? Do you plan to provide an RNA-Seq model for newer versions of the software? Bests |
I think it's best to change to 1.4.0 so you get consistent results. I will bring up the RNA-seq question to the team. |
Hi @CarlosMenFer , after discussing with the team the current state is that we need to update several components in the RNA-seq support for a new release. Even though there are some work in progress, it's difficult to say an exact timeline for when it will be released. Once we have all the cards in place, a new RNA-seq model will be provided with a DeepVariant release but it is unlikely to be with version 1.9.0 |
Hi @kishwarshafin, thank you for your support. I understand, we will wait for the next version of the software compatible with the RNA_Seq model. In the meantime, we will work with version 1.4.0. I have a question about ‘make_examples_extra_args’. According to the RNA-Seq tutorial, I have to provide the option ‘channel_list=’BASE_CHANNELS'. However, when I try this with 1.4.0:
And if I change this part to channels='BASE_CHANNELS'.
How should I tune this parameter? |
Please follow this doc: https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-rnaseq-case-study.md and do not chnage BIN_VERSION="1.4.0" |
Thank you for the information @kishwarshafin . I follow the tutorial, using the version 1.4.0. This is the error I obtained: BIN_VERSION="1.4.0"
And if I change "channel_list" to "channels", I obtained this error:
|
@CarlosMenFer apologies - it looks like the case study contained a change that it should not have. I have updated the case study, correcting the command for v1.4.0: BIN_VERSION="1.4.0"
sudo docker run \
-v "$(pwd):$(pwd)" \
-w $(pwd) \
google/deepvariant:"${BIN_VERSION}" \
run_deepvariant \
--model_type=WES \
--customized_model=model/model.ckpt \
--ref=reference/GRCh38_no_alt_analysis_set.fasta \
--reads=data/hg005_gm26107.mrna.grch38.bam \
--output_vcf=output/HG005.output.vcf.gz \
--num_shards=$(nproc) \
--regions=data/chr20_CDS_3x.bed \
--make_examples_extra_args="split_skip_reads=true,channels=''" \
--intermediate_results_dir output/intermediate_results_dir Can you give this a try and let me know if you run into any issues? |
I tried both the tutorial dataset and some of my own data, and it worked. I will keep an eye out for an update on the RNA-Seq model, as I am interested in using this software as up to date as possible. |
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:
yes
Describe the issue:
(A clear and concise description of what the issue is.)
I was running deepvariant_pangenome_aware_deepvariant on vg Giraffe-mapped BAM files. However, part of the sample encountered a Process ForkProcess issue. It didn’t throw an error, didn’t terminate properly, and produced no output files.
Setup
Operating system: slurm
DeepVariant version: 1.8.0
Installation method (Docker, built from source, etc.): singularity pull
Type of data: (sequencing instrument, reference genome, anything special that is unlike the case studies?)
Illumina human 30x WGS, vg Giraffe-mapped HPRC
Steps to reproduce:
Command:
singularity exec -B /path/:/path/ /path/deepvariant_pangenome_aware_deepvariant-1.8.0.sif /opt/deepvariant/bin/run_pangenome_aware_deepvariant
--model_type=WGS
--ref=/path/HPRC.GRCh38.reordered.fa
--reads=/path/$sample_name.surject.GRCh38.sorted.dedup.lefted.realigned.bam
--num_shards=4
--sample_name_reads=$sample_name
--output_vcf /path/$sample_name.deepvariant.vcf.gz
--output_gvcf /path/$sample_name.deepvariant.gvcf.gz
--pangenome /path/HPRC_graph.gbz
--sample_name_pangenome HPRC
--regions chr6:28000000-35000000
--disable_small_model
--intermediate_results_dir /path/dpvariant
Error trace: (if applicable)
The logs indicate the program was running normally until encountering the following issues:
2025-01-18 22:43:10.537301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1';
2025-01-18 22:43:10.537341: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
I0118 22:43:10.537735 47448200671232 call_variants.py:918] call_variants: env = {'BASH_FUNC_module()': '() { eval
/usr/bin/modulecmd bash $*
\n}', 'SHI0118 22:43:10.659484 47448200671232 call_variants.py:785] Total 1 writing processes started.
I0118 22:43:10.661774 47448200671232 call_variants.py:796] Use saved model: True
I0118 22:43:10.665955 47448200671232 dv_utils.py:325] From /path/dpvariant/make_examples_pangenome_aware_dv.t
I0118 22:43:21.476414 47448200671232 dv_utils.py:325] From /opt/models/pangenome_aware_deepvariant/wgs/example_info.json: Shape of input examples: [200,
I0118 22:43:21.476675 47448200671232 call_variants.py:814] example_shape: [200, 221, 7]
Process ForkProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/tmp/Bazel.runfiles_yqt9b630/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 551, in post_processing
item = output_queue.get(timeout=180)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 114, in get
raise Empty
_queue.Empty
I0118 22:46:46.215257 47448200671232 call_variants.py:891] Predicted 1024 examples in 1 batches [19.962 sec per 100].
I0118 23:42:47.613373 47448200671232 call_variants.py:967] Complete: call_variants.
Does the quick start test work on your system?
Yes, the quick start test works, and most of the samples finish normally.
Any additional context:
Initially, I thought the issue was caused by the small model, so I added the --disable_small_model parameter. While this allowed some samples to run successfully, the same issue persists for other samples.
The text was updated successfully, but these errors were encountered: