Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Samples Encounter ForkProcess Empty Issues with No Output Using run_pangenome_aware_deepvariant #926

Closed
EEEdyeah opened this issue Jan 19, 2025 · 17 comments

Comments

@EEEdyeah
Copy link

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:
yes
Describe the issue:
(A clear and concise description of what the issue is.)
I was running deepvariant_pangenome_aware_deepvariant on vg Giraffe-mapped BAM files. However, part of the sample encountered a Process ForkProcess issue. It didn’t throw an error, didn’t terminate properly, and produced no output files.
Setup

  • Operating system: slurm

  • DeepVariant version: 1.8.0

  • Installation method (Docker, built from source, etc.): singularity pull

  • Type of data: (sequencing instrument, reference genome, anything special that is unlike the case studies?)
    Illumina human 30x WGS, vg Giraffe-mapped HPRC
    Steps to reproduce:

  • Command:
    singularity exec -B /path/:/path/ /path/deepvariant_pangenome_aware_deepvariant-1.8.0.sif /opt/deepvariant/bin/run_pangenome_aware_deepvariant
    --model_type=WGS
    --ref=/path/HPRC.GRCh38.reordered.fa
    --reads=/path/$sample_name.surject.GRCh38.sorted.dedup.lefted.realigned.bam
    --num_shards=4
    --sample_name_reads=$sample_name
    --output_vcf /path/$sample_name.deepvariant.vcf.gz
    --output_gvcf /path/$sample_name.deepvariant.gvcf.gz
    --pangenome /path/HPRC_graph.gbz
    --sample_name_pangenome HPRC
    --regions chr6:28000000-35000000
    --disable_small_model
    --intermediate_results_dir /path/dpvariant

  • Error trace: (if applicable)
    The logs indicate the program was running normally until encountering the following issues:
    2025-01-18 22:43:10.537301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1';
    2025-01-18 22:43:10.537341: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
    I0118 22:43:10.537735 47448200671232 call_variants.py:918] call_variants: env = {'BASH_FUNC_module()': '() { eval /usr/bin/modulecmd bash $*\n}', 'SH
    I0118 22:43:10.659484 47448200671232 call_variants.py:785] Total 1 writing processes started.
    I0118 22:43:10.661774 47448200671232 call_variants.py:796] Use saved model: True
    I0118 22:43:10.665955 47448200671232 dv_utils.py:325] From /path/dpvariant/make_examples_pangenome_aware_dv.t
    I0118 22:43:21.476414 47448200671232 dv_utils.py:325] From /opt/models/pangenome_aware_deepvariant/wgs/example_info.json: Shape of input examples: [200,
    I0118 22:43:21.476675 47448200671232 call_variants.py:814] example_shape: [200, 221, 7]
    Process ForkProcess-1:
    Traceback (most recent call last):
    File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
    File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    File "/tmp/Bazel.runfiles_yqt9b630/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 551, in post_processing
    item = output_queue.get(timeout=180)
    File "/usr/lib/python3.10/multiprocessing/queues.py", line 114, in get
    raise Empty
    _queue.Empty
    I0118 22:46:46.215257 47448200671232 call_variants.py:891] Predicted 1024 examples in 1 batches [19.962 sec per 100].
    I0118 23:42:47.613373 47448200671232 call_variants.py:967] Complete: call_variants.

Does the quick start test work on your system?
Yes, the quick start test works, and most of the samples finish normally.

Any additional context:
Initially, I thought the issue was caused by the small model, so I added the --disable_small_model parameter. While this allowed some samples to run successfully, the same issue persists for other samples.

@kishwarshafin
Copy link
Collaborator

Hi @EEEdyeah , can you please run it on entire chr6 to see if the issue persists?

@EEEdyeah
Copy link
Author

@kishwarshafin Hi, I will try and it's still running. In the meantime, I found that when I reran the same code (chr6:28000000-35000000), part of the previously failed sample ran successfully. This suggests that the same code can produce different results, which makes me question the stability of the previously successful runs?

@kishwarshafin
Copy link
Collaborator

@EEEdyeah are you running on a system that pauses the processes? It seems like in your run, call variants was paused and the queue did not receive anything for 180 secs which is why it got killed. Can you try by setting num cpus to 0 from the command line and see if it still gets killed.

@EEEdyeah
Copy link
Author

@kishwarshafin Sorry for the late reply. I’m not entirely sure what caused the issue, but I think I’ve found a solution. Running each job on a separate node seems to prevent the error from occurring.

@CarlosMenFer
Copy link

CarlosMenFer commented Feb 28, 2025

Hi @kishwarshafin. I am facing the same problem as this issue.

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:
yes

Describe the issue:

I am trying to reproduce the tutorial for the RNA-Seq use case from the following link:https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-rnaseq-case-study.md

However, I have encountered the same error as the one described in this issue.

Setup

  • Operating system: Ubuntu 24.04.2 LTS

  • DeepVariant version: 1.8.0

  • Installation method (Docker, built from source, etc.): Docker and singularity

  • Type of data: (sequencing instrument, reference genome, anything special that is unlike the case studies?): The dataset provided in the RNA-Seq tutorial.

  • Command:
    docker run -v "$(pwd):$(pwd)" -w "$(pwd)" google/deepvariant:"1.8.0" run_deepvariant --model_type=WES --customized_model=model/model.ckpt --ref=reference/GRCh38_no_alt_analysis_set.fasta --reads=data/hg005_gm26107.mrna.grch38.bam --output_vcf=output/HG005.output.vcf.gz --num_shards=$(nproc) --regions=data/chr20_CDS_3x.bed --make_examples_extra_args="split_skip_reads=true,channel_list='BASE_CHANNELS'" --intermediate_results_dir output/intermediate_results_dir

singularity exec -B "$(pwd):$(pwd)" /home/carlos_menor/Documents/deepvariant_test/deepvariant_1.8.0.sif run_deepvariant --model_type=WES --customized_model=/home/carlos_menor/Documents/deepvariant_test/rnaseq/model/model.ckpt --ref=/home/carlos_menor/Documents/deepvariant_test/rnaseq/reference/GRCh38_no_alt_analysis_set.fasta --reads=/home/carlos_menor/Documents/deepvariant_test/rnaseq/data/hg005_gm26107.mrna.grch38.bam --output_vcf=/home/carlos_menor/Documents/deepvariant_test/rnaseq/output/HG005.output.vcf.gz --num_shards=$(nproc) --regions=/home/carlos_menor/Documents/deepvariant_test/rnaseq/data/chr20_CDS_3x.bed --make_examples_extra_args="split_skip_reads=true,channel_list='BASE_CHANNELS'" --intermediate_results_dir /home/carlos_menor/Documents/deepvariant_test/rnaseq/output/intermediate_results_dir

  • Error trace: (if applicable):

Process ForkProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/tmp/Bazel.runfiles_rh256k9z/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 551, in post_processing
item = output_queue.get(timeout=180)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 114, in get
raise Empty
_queue.Empty

Does the quick start test work on your system?
Yes, the quick start test works.

Any additional context:
Using version 1.5.0, the RNA-Seq test works, as well as on the samples I am working with. However, version 1.6.0 and 1.6.1 do not work.

Furthermore, I had to change the name of the model file "model.ckpt.example_info.json" to "example_info.json", because run_deepvariant did not find this file in the model directory path.

I attach the log files for both executions, singularity and docker.

log_rnaseq_docker.txt
log_rnaseq_singularity.txt

@kishwarshafin
Copy link
Collaborator

@CarlosMenFer what system are you working on? It seems like the data processor is not sending and data to the queue for 180 seconds and it's getting killed. Does your system throttle or pause workers while they are running?

@lucasbrambrink
Copy link
Collaborator

@CarlosMenFer I noticed that there's an error raised a bit before the queue starts going:

Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 976, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/absl_py/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/absl_py/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 952, in main
    call_variants(
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 801, in call_variants
    example_shape, model = load_model_and_check_shape(
  File "/tmp/Bazel.runfiles_rh256k9z/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 707, in load_model_and_check_shape
    model.load_weights(checkpoint_path).expect_partial()
  File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 1047, in assert_consumed
    raise AssertionError(
AssertionError: Some objects had attributes which were not restored: 
    <tf.Variable 'conv2d/kernel:0' shape=(3, 3, 6, 32) dtype=float32, numpy=...

It looks like model.load_weights(checkpoint_path).expect_partial() is failing. Can you confirm that the checkpoint you trained is being called with the same parameters? Alternatively, you could try converting it to a saved model to see if that helps.

@CarlosMenFer
Copy link

CarlosMenFer commented Mar 3, 2025

Thank you for your answers.

@kishwarshafin This are my system specifications:

  • Hardware Model: HP HP ZBook Studio 16 inch G10 Mobile Workstation PC
  • Processor: 13th Gen Intel® Core™ i7-13800H × 20
  • Memory: 64.0 GiB
  • Operating System: Ubuntu 24.04.2 LTS

I have tried both, the docker version and the singularity version (from your repository).

I find the same problem when I run the same process in a server.

@lucasbrambrink How can I check the checkpoint I trained is being called with the same parameters? I used the model files that I downloaded from this page:

Furthermore, for Deepvariant 1.8.0, y had to rename the model.ckpt.example_info.json to example_info.json, because the software failed and required it.

What do you mean by converting it to a saved model?

Thank you in advance.

Best regards
Carlos

@kishwarshafin
Copy link
Collaborator

@CarlosMenFer ,

You are trying to run v1.8.0 with 1.4.0 model which won't work. We moved framework from slim to keras on 1.6.0 version so the models will not be compatible anymore. Please use 1.4.0 for your use-case for now.

@CarlosMenFer
Copy link

@kishwarshafin Thank you for your response.

For now, I am using the 1.5.0 version. It is compatible with the 1.4.0 model? Or I should change to DeepVariant 1.4.0?

Do you plan to provide an RNA-Seq model for newer versions of the software?

Bests
Carlos

@kishwarshafin
Copy link
Collaborator

I think it's best to change to 1.4.0 so you get consistent results. I will bring up the RNA-seq question to the team.

@kishwarshafin
Copy link
Collaborator

Hi @CarlosMenFer , after discussing with the team the current state is that we need to update several components in the RNA-seq support for a new release. Even though there are some work in progress, it's difficult to say an exact timeline for when it will be released. Once we have all the cards in place, a new RNA-seq model will be provided with a DeepVariant release but it is unlikely to be with version 1.9.0

@CarlosMenFer
Copy link

Hi @kishwarshafin, thank you for your support. I understand, we will wait for the next version of the software compatible with the RNA_Seq model.

In the meantime, we will work with version 1.4.0. I have a question about ‘make_examples_extra_args’.

According to the RNA-Seq tutorial, I have to provide the option ‘channel_list=’BASE_CHANNELS'. However, when I try this with 1.4.0:

FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?

And if I change this part to channels='BASE_CHANNELS'.

E0307 08:41:05.125913 131366803224384 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size

How should I tune this parameter?

@kishwarshafin
Copy link
Collaborator

Please follow this doc: https://github.com/google/deepvariant/blob/r1.8/docs/deepvariant-rnaseq-case-study.md and do not chnage BIN_VERSION="1.4.0"

@CarlosMenFer
Copy link

CarlosMenFer commented Mar 10, 2025

Thank you for the information @kishwarshafin .

I follow the tutorial, using the version 1.4.0. This is the error I obtained:

BIN_VERSION="1.4.0"
sudo docker run -v "$(pwd):$(pwd)" -w "$(pwd)" google/deepvariant:"${BIN_VERSION}" run_deepvariant --model_type=WES --customized_model=model/model.ckpt --ref=reference/GRCh38_no_alt_analysis_set.fasta --reads=data/hg005_gm26107.mrna.grch38.bam --output_vcf=output/HG005.output.vcf.gz --num_shards=$(nproc) --regions=data/chr20_CDS_3x.bed --make_examples_extra_args="split_skip_reads=true,channel_list='BASE_CHANNELS'" --intermediate_results_dir output/intermediate_results_dir

I0310 07:25:14.063454 130780876670784 run_deepvariant.py:338] Creating a directory for intermediate results in output/intermediate_results_dir
I0310 07:25:14.063800 130780876670784 run_deepvariant.py:359] You set --customized_model. Instead of using the default model for WES, `call_variants` step will load model/model.ckpt* instead.

***** Intermediate results will be written to output/intermediate_results_dir in docker. ****


***** Running the command:*****
time seq 0 19 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "reference/GRCh38_no_alt_analysis_set.fasta" --reads "data/hg005_gm26107.mrna.grch38.bam" --examples "output/intermediate_results_dir/[email protected]" --channel_list 'BASE_CHANNELS' --channels "insert_size" --regions "data/chr20_CDS_3x.bed" --split_skip_reads --task {}

FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
FATAL Flags parsing error: Unknown command line flag 'channel_list'. Did you mean: channels ?
Pass --helpshort or --helpfull to see help on flags.
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref reference/GRCh38_no_alt_analysis_set.fasta --reads data/hg005_gm26107.mrna.grch38.bam --examples output/intermediate_results_dir/[email protected] --channel_list BASE_CHANNELS --channels insert_size --regions data/chr20_CDS_3x.bed --split_skip_reads --task 0

And if I change "channel_list" to "channels", I obtained this error:

I0310 07:28:18.076153 132269689579328 run_deepvariant.py:342] Re-using the directory for intermediate results in output/intermediate_results_dir
I0310 07:28:18.076461 132269689579328 run_deepvariant.py:359] You set --customized_model. Instead of using the default model for WES, `call_variants` step will load model/model.ckpt* instead.

Warning: --channels is previously set to insert_size, now to 'BASE_CHANNELS'.

***** Intermediate results will be written to output/intermediate_results_dir in docker. ****


***** Running the command:*****
time seq 0 19 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "reference/GRCh38_no_alt_analysis_set.fasta" --reads "data/hg005_gm26107.mrna.grch38.bam" --examples "output/intermediate_results_dir/[email protected]" --channels 'BASE_CHANNELS' --regions "data/chr20_CDS_3x.bed" --split_skip_reads --task {}

I0310 07:28:23.193606 126099256416064 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.200210 126099256416064 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.435532 127415574579008 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.442096 127415574579008 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.552639 139739300931392 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.559232 139739300931392 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.403039 127049448355648 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.409634 127049448355648 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.537462 140133164533568 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.544063 140133164533568 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.424960 126071827236672 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.431614 126071827236672 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.575697 133410013574976 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.582631 133410013574976 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
I0310 07:28:23.778682 140408307177280 genomics_reader.py:222] Reading data/hg005_gm26107.mrna.grch38.bam with NativeSamReader
E0310 07:28:23.786323 140408307177280 errors.py:61] Channel "BASE_CHANNELS" is not one of the available opt channels: read_mapping_percent, avg_base_quality, identity, gap_compressed_identity, gc_content, is_homopolymer, homopolymer_weighted, blank, insert_size
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref reference/GRCh38_no_alt_analysis_set.fasta --reads data/hg005_gm26107.mrna.grch38.bam --examples output/intermediate_results_dir/[email protected] --channels BASE_CHANNELS --regions data/chr20_CDS_3x.bed --split_skip_reads --task 0

@kishwarshafin kishwarshafin reopened this Mar 10, 2025
@danielecook
Copy link
Collaborator

@CarlosMenFer apologies - it looks like the case study contained a change that it should not have.

I have updated the case study, correcting the command for v1.4.0:

BIN_VERSION="1.4.0"

sudo docker run \
  -v "$(pwd):$(pwd)" \
  -w $(pwd) \
  google/deepvariant:"${BIN_VERSION}" \
  run_deepvariant \
    --model_type=WES \
    --customized_model=model/model.ckpt \
    --ref=reference/GRCh38_no_alt_analysis_set.fasta \
    --reads=data/hg005_gm26107.mrna.grch38.bam \
    --output_vcf=output/HG005.output.vcf.gz \
    --num_shards=$(nproc) \
    --regions=data/chr20_CDS_3x.bed \
    --make_examples_extra_args="split_skip_reads=true,channels=''" \
    --intermediate_results_dir output/intermediate_results_dir

Can you give this a try and let me know if you run into any issues?

@CarlosMenFer
Copy link

@danielecook

I tried both the tutorial dataset and some of my own data, and it worked.

I will keep an eye out for an update on the RNA-Seq model, as I am interested in using this software as up to date as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants