Pipeline fails with "Cannot open file x.tsv for writing. Too many open files" #158

imnuvi · 2025-02-28T21:05:51Z

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux 8.8 (Ootpa)

Workflow Version

v3.0.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run software_packages/wf-single-cell
--expected_cells 10000
--fastq gene_expression/library_2_reads_merged
--kit 3prime:v3
--ref_genome_dir data/RefGenome/
-profile standard
--out_dir runs/run5

Workflow Execution - CLI Execution Profile

singularity

What happened?

Setup:

kit: 3prime:v3
executor: singularity
environment: computing cluster
RAM: 128GB
cores: 12

Current Behaviour
Tried running the epi2me single cell workflow on our sequencing data and the pipeline fails at the cat_tags_by_chrom step. This command uses awk which seems to open a lot of files and exceeds the system limit for files.

Expected Behaviour
The pipeline runs fine without any issues and gives out the results matrix

Relevant log output

executor >  local (128)
[ee/7dc26d] fastcat (1)                    | 1 of 1 ✔
[9d/89cc49] parse_kit_metadata (1)         | 1 of 1 ✔
[24/e1d96e] pipeline:getVersions           | 1 of 1 ✔
[5c/1fec01] pipeline:getParams             | 1 of 1 ✔
[3e/925f97] pip…e:preprocess:call_paftools | 1 of 1 ✔
[af/04a6bb] pip…rocess:build_minimap_index | 1 of 1 ✔
[0a/3568bc] pip…cess:call_adapter_scan (5) | 11 of 11 ✔
[c8/4b9212] pip…ummarize_adapter_table (1) | 1 of 1 ✔
[c0/bf5b0f] pip…s_bams:split_gtf_by_chroms | 1 of 1 ✔
[f0/1a15a2] pip…ams:generate_whitelist (1) | 1 of 1 ✔
[b4/4de3de] pip…_bams:assign_barcodes (11) | 11 of 11 ✔
[f9/5bcfab] pip…:merge_and_publish_tsv (1) | 1 of 1 ✔
[99/ddf8c2] pip…bams:cat_tags_by_chrom (1) | 1 of 1, failed: 1 ✘
[0d/ae8f94] pip…rocess_bams:merge_bams (1) | 1 of 1 ✔
[61/453413] pip…rocess_bams:stringtie (46) | 47 of 47 ✔
[22/f19512] pip…lign_to_transcriptome (47) | 47 of 47 ✔
[-        ] pip…ocess_bams:assign_features -
[-        ] pip…process_bams:create_matrix -
[-        ] pip…rocess_bams:process_matrix -
[-        ] pip…s_bams:merge_transcriptome -
[-        ] pip…e:process_bams:pack_images | 0 of 1
Plus 5 more processes waiting for tasks…
ERROR ~ Error executing process > 'pipeline:process_bams:cat_tags_by_chrom (1)'

Caused by:
  Process `pipeline:process_bams:cat_tags_by_chrom (1)` terminated with an error exit status (2)


Command executed:

  mkdir chr_tags
  # Find the chr column number
  files=(tags/*)
  chr_col=$(awk -v RS=' ' '/chr/{print NR; exit}' "${files[0]}")

  # merge the tags TSVs, keep header from first file and split entries by chromosome
  awk -F'       ' -v chr_col=$chr_col 'FNR==1{hdr=$0; next}     {if (!seen[$chr_col]++)         print hdr>"chr_tags/"$chr_col".tsv";         print>"chr_tags/"$chr_col".tsv"}' tags/*

Command exit status:
  2

Command output:
  (empty)

Command error:
  awk: cannot open "chr_tags/ENST00000575475.2.tsv" for output (Too many open files)

Application activity log entry

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

nrhorner · 2025-03-03T10:23:56Z

Hi @imnuvi

It appears that in cat_tags_by_chrom process, the files are being aggregated by transcript ID not chromosome.

This is strange because the tags files should not contain any transcript info at this point

Is it possible that you have have supplied a transcriptome sequence instead of a genome sequence?

What is the content of the following file?

<ref_genome_dir>/fasta/genome.fa

imnuvi · 2025-03-03T19:27:58Z

Hi @nrhorner,

Thanks for your reply!
You could be right on the transcriptome part.

Attaching the first few lines from <refgenome_dir>/fasta/genome.fa

`>ENST00000448914.1 cdna chromosome:GRCh38:14:22449113:22449125:1 gene:ENSG00000228985.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T cell receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]
ACTGGGGGATACG

ENST00000631435.1 cdna chromosome:GRCh38:CHR_HSCHR7_2_CTG6:142847306:142847317:1 gene:ENSG00000282253.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRBD1 description:T cell receptor beta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12158]
GGGACAGGGGGC
ENST00000632684.1 cdna chromosome:GRCh38:7:142786213:142786224:1 gene:ENSG00000282431.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRBD1 description:T cell receptor beta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12158]
GGGACAGGGGGC
ENST00000434970.2 cdna chromosome:GRCh38:14:22439007:22439015:1 gene:ENSG00000237235.2 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD2 description:T cell receptor delta diversity 2 [Source:HGNC Symbol;Acc:HGNC:12255]
CCTTCCTAC
ENST00000415118.1 cdna chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254]
GAAATAGT
ENST00000633010.1 cdna chromosome:GRCh38:CHR_HSCHR14_3_CTG1:105895279:105895294:-1 gene:ENSG00000282274.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD4-17 description:immunoglobulin heavy diversity 4-17 [Source:HGNC Symbol;Acc:HGNC:5503]
TGACTACGGTGACTAC
ENST00000632968.1 cdna chromosome:GRCh38:CHR_HSCHR14_3_CTG1:105891962:105891978:-1 gene:ENSG00000282592.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD1-20 description:immunoglobulin heavy diversity 1-20 [Source:HGNC Symbol;Acc:HGNC:5484]
GGTATAACTGGAACGAC
ENST00000603693.1 cdna chromosome:GRCh38:15:21011451:21011469:-1 gene:ENSG00000270451.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD4OR15-4B description:immunoglobulin heavy diversity 4/OR15-4B (non-functional) [Source:HGNC Symbol;Acc:HGNC:5507]
TGACTATGGTGCTAACTAC
ENST00000452198.1 cdna chromosome:GRCh38:14:105881539:105881556:-1 gene:ENSG00000225825.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD6-25 description:immunoglobulin heavy diversity 6-25 [Source:HGNC Symbol;Acc:HGNC:5516]
GGGTATAGCAGCGGCTAC
ENST00000632609.1 cdna chromosome:GRCh38:CHR_HSCHR14_3_CTG1:105905268:105905298:-1 gene:ENSG00000282373.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD3-10 description:immunoglobulin heavy diversity 3-10 [Source:HGNC Symbol;Acc:HGNC:5495]
GTATTACTATGGTTCGGGGAGTTATTATAAC`

nrhorner · 2025-03-03T20:29:00Z

Hi @imnuvi

This is the issue. That file should be a genomic DNA sequence.

imnuvi · 2025-03-05T17:17:49Z

Hi @nrhorner,

I ran the pipeline with full genome sequence and all steps have run except the process matrix step. Attaching the error below

executor > local (225)
[c6/bd05c1] fastcat (1) | 1 of 1 ✔
[e5/6bb04a] parse_kit_metadata (1) | 1 of 1 ✔
[29/aa8e5f] pipeline:getVersions | 1 of 1 ✔
[99/f688b2] pipeline:getParams | 1 of 1 ✔
[3b/2cbd25] pip…e:preprocess:call_paftools | 1 of 1 ✔
[9f/ede70b] pip…rocess:build_minimap_index | 1 of 1 ✔
[be/e4cc24] pip…cess:call_adapter_scan (6) | 11 of 11 ✔
[3e/c6fcd3] pip…ummarize_adapter_table (1) | 1 of 1 ✔
[b0/18d39a] pip…s_bams:split_gtf_by_chroms | 1 of 1 ✔
[9a/8d9722] pip…ams:generate_whitelist (1) | 1 of 1 ✔
[0f/89d8fc] pip…s_bams:assign_barcodes (6) | 11 of 11 ✔
[b4/6f15bd] pip…:merge_and_publish_tsv (1) | 1 of 1 ✔
[12/3f40da] pip…bams:cat_tags_by_chrom (1) | 1 of 1 ✔
[42/df7c6a] pip…rocess_bams:merge_bams (1) | 1 of 1 ✔
[6f/077c91] pip…rocess_bams:stringtie (46) | 47 of 47 ✔
[37/845e70] pip…lign_to_transcriptome (47) | 47 of 47 ✔
[d5/03da76] pip…_bams:assign_features (10) | 45 of 45 ✔
[12/97f316] pip…ss_bams:create_matrix (45) | 45 of 45 ✔
[f1/eedee3] pip…ss_bams:process_matrix (1) | 1 of 2, failed: 1
[e6/28b9ea] pip…ms:merge_transcriptome (1) | 1 of 1 ✔
[38/b74138] pip…ombine_final_tag_files (1) | 1 of 1 ✔
[50/8c8bb2] pip…e:process_bams:tag_bam (1) | 0 of 1
[d5/f702b0] pip…ms:umi_gene_saturation (1) | 1 of 1 ✔
[a2/5e4f55] pip…ocess_bams:pack_images (1) | 1 of 1 ✔
Plus 2 more processes waiting for tasks…
ERROR ~ Error executing process > 'pipeline:process_bams:process_matrix (1)'

Caused by:
Process pipeline:process_bams:process_matrix (1) terminated with an error exit status (1)

Command executed:

export NUMBA_NUM_THREADS=1
workflow-glue process_matrix inputs/matrix*.hdf --feature gene --raw "89c43cbf223240f7a4c162d939ddbd7d.gene_raw_feature_bc_matrix" --processed "89c43cbf223240f7a4c162d939ddbd7d.gene_processed_feature_bc_matrix" --per_cell_mito "89c43cbf223240f7a4c162d939ddbd7d.gene_expression_mito_per_cell.tsv" --per_cell_expr "89c43cbf223240f7a4c162d939ddbd7d.gene_expression_mean_per_cell.tsv" --umap_tsv "89c43cbf223240f7a4c162d939ddbd7d.gene_expression_umap_REPEAT.tsv" --enable_filtering --min_features 200 --min_cells 3 --max_mito 20 --mito_prefixes MT- --norm_count 10000 --enable_umap --replicates 3

Command exit status:
1

Command output:
(empty)

Command error:
[18:14:26 - workflow_glue] Bootstrapping CLI.
/home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
/home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
/home/epi2melabs/conda/lib/python3.8/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()
Traceback (most recent call last):
File "/home/nuvi/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in
cli()
File "/home/nuvi/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/init.py", line 66, in cli
components = get_components(allowed_components=[sys.argv[1]])
File "/home/nuvi/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/init.py", line 29, in get_components
mod = importlib.import_module(f"{_package_name}.{name}")
File "/home/epi2melabs/conda/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "", line 843, in exec_module
File "", line 219, in call_with_frames_removed
File "/home/nuvi/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/process_matrix.py", line 8, in
import umap
File "/home/epi2melabs/conda/lib/python3.8/site-packages/umap/init.py", line 2, in
from .umap import UMAP
File "/home/epi2melabs/conda/lib/python3.8/site-packages/umap/umap.py", line 41, in
from umap.layouts import (
File "/home/epi2melabs/conda/lib/python3.8/site-packages/umap/layouts.py", line 40, in
def rdist(x, y):
File "/home/epi2melabs/conda/lib/python3.8/site-packages/numba/core/decorators.py", line 234, in wrapper
disp.enable_caching()
File "/home/epi2melabs/conda/lib/python3.8/site-packages/numba/core/dispatcher.py", line 863, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/numba/core/caching.py", line 601, in init
self._impl = self._impl_class(py_func)
File "/home/epi2melabs/conda/lib/python3.8/site-packages/numba/core/caching.py", line 337, in init
raise RuntimeError("cannot cache function %r: no locator available "
RuntimeError: cannot cache function 'rdist': no locator available for file '/home/epi2melabs/conda/lib/python3.8/site-packages/umap/layouts.py'

nrhorner · 2025-03-05T17:29:46Z

Hi @imnuvi please see the troubleshooting section in the Readme that relates to this.

Thanks

nrhorner · 2025-04-16T16:28:49Z

@imnuvi In v3.1.0 the numba cache is set to the process directory, which should fix this issue

nrhorner · 2025-05-12T11:34:23Z

@imnuvi Did you get round to trying out the new version?

imnuvi · 2025-05-16T20:09:02Z

Hi @nrhorner I haven't tried it out yet. Will try it out and update.

nrhorner mentioned this issue Mar 3, 2025

Matrix generation steps not running on the workflow #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline fails with "Cannot open file x.tsv for writing. Too many open files" #158

Pipeline fails with "Cannot open file x.tsv for writing. Too many open files" #158

imnuvi commented Feb 28, 2025

nrhorner commented Mar 3, 2025

Uh oh!

imnuvi commented Mar 3, 2025

Uh oh!

nrhorner commented Mar 3, 2025

Uh oh!

imnuvi commented Mar 5, 2025 •

edited

Loading

Uh oh!

nrhorner commented Mar 5, 2025

Uh oh!

nrhorner commented Apr 16, 2025

Uh oh!

nrhorner commented May 12, 2025

Uh oh!

imnuvi commented May 16, 2025

Uh oh!

Pipeline fails with "Cannot open file x.tsv for writing. Too many open files" #158

Pipeline fails with "Cannot open file x.tsv for writing. Too many open files" #158

Comments

imnuvi commented Feb 28, 2025

Operating System

Other Linux

Workflow Version

Workflow Execution

Other workflow execution

EPI2ME Version

CLI command run

Workflow Execution - CLI Execution Profile

What happened?

Relevant log output

Application activity log entry

Were you able to successfully run the latest version of the workflow with the demo data?

Other demo data information

nrhorner commented Mar 3, 2025

Uh oh!

imnuvi commented Mar 3, 2025

Uh oh!

nrhorner commented Mar 3, 2025

Uh oh!

imnuvi commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrhorner commented Mar 5, 2025

Uh oh!

nrhorner commented Apr 16, 2025

Uh oh!

nrhorner commented May 12, 2025

Uh oh!

imnuvi commented May 16, 2025

Uh oh!

imnuvi commented Mar 5, 2025 •

edited

Loading