Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pycistopic Temporary Fragment file cannot be found #458

Open
yrsong001 opened this issue Sep 4, 2024 · 1 comment
Open

Pycistopic Temporary Fragment file cannot be found #458

yrsong001 opened this issue Sep 4, 2024 · 1 comment

Comments

@yrsong001
Copy link

yrsong001 commented Sep 4, 2024

Describe the bug
Hi! I am following the pycistopic tutorial here. https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html. It shows a error ValueError: Fragment file ./temp/age_2y_s1/BCell1.fragments.tsv.gz does not exist., which I believe it is generated in the process. Can you help with the debugging? Thank you!

To Reproduce

fragments_dict = {'age_2y_s1': '/proj/liulab/users/yrsong/aging/Dataset_Creation/run_cellranger_atac/1-ATAC/outs/fragments.tsv.gz',
                 'age_2y_s2': './run_cellranger_atac/2-ATAC/outs/fragments.tsv.gz',
                 'age_1y_s1': './3-ATAC/outs/fragments.tsv.gz',
                 'age_1y_s2': './4-ATAC/outs/fragments.tsv.gz',
                 'age_3m_s1': './5-ATAC/outs/fragments.tsv.gz',
                 'age_3m_s2': './6-ATAC/outs/fragments.tsv.gz'}

from pycisTopic.pseudobulk_peak_calling import *
bw_paths, bed_paths = export_pseudobulk(
    input_data = cell_data,
    variable = "celltype",
    sample_id_col = "sample_id",
    chromsizes = chromsizes,
    bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
    bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
    path_to_fragments = fragments_dict,
    n_cpu = 10,
    normalize_bigwig = True,
    temp_dir = "./temp", 
    split_pattern = None
)

**Error output.**


bw_paths, bed_paths = export_pseudobulk(
    input_data = cell_data,
    variable = "celltype",
    sample_id_col = "sample_id",
    chromsizes = chromsizes,
    bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
    bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
    path_to_fragments = fragments_dict,
    n_cpu = 10,
    normalize_bigwig = True,
    temp_dir = "./temp", # /work/users/y/r/yrsong/vsc31305/
    split_pattern = None
)
2024-09-04 00:04:14,953 cisTopic     INFO     Splitting fragments by cell type.


ValueError                                Traceback (most recent call last)
Cell In[12], line 8
      6 ray.shutdown()
      7 from pycisTopic.pseudobulk_peak_calling import *
----> 8 bw_paths, bed_paths = export_pseudobulk(
      9     input_data = cell_data,
     10     variable = "celltype",
     11     sample_id_col = "sample_id",
     12     chromsizes = chromsizes,
     13     bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
     14     bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
     15     path_to_fragments = fragments_dict,
     16     n_cpu = 10,
     17     normalize_bigwig = True,
     18     temp_dir = "./temp", 
     19     split_pattern = None
     20 )

File /proj/liulab/users/yrsong/aging/Dataset_Creation/SCENIC_plus_Analysis/scplus_pipeline/Snakemake/config/pycisTopic/src/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir)
    159 # For each sample, get fragments for each cell type
    161 log.info("Splitting fragments by cell type.")
--> 162 split_fragment_files_by_cell_type(
    163     sample_to_fragment_file = path_to_fragments,
    164     path_to_temp_folder = temp_dir,
    165     path_to_output_folder = bed_path,
    166     sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes,
    167     chromsizes = chromsizes_dict,
    168     n_cpu = n_cpu,
    169     verbose = False,
    170     clear_temp_folder = True
    171 )
    173 bed_paths = {}
    174 for cell_type in cell_data[variable].unique():

File ~/.conda/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:92, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder)
     90 path_to_fragment_file = os.path.join(path_to_temp_folder, sample, f"{cell_type_sanitized}.fragments.tsv.gz")
     91 if not os.path.exists(path_to_fragment_file):
---> 92     raise ValueError(f"Fragment file {path_to_fragment_file} does not exist.")
     93 if cell_type_sanitized not in cell_type_to_fragment_files:
     94     cell_type_to_fragment_files[cell_type_sanitized] = []


**ValueError: Fragment file ./temp/age_2y_s1/BCell1.fragments.tsv.gz does not exist.**

Version (please complete the following information):

  • Python: 3.11
  • SCENIC+: 1.0a1
@SeppeDeWinter
Copy link
Collaborator

Hi @yrsong001

Seems like a similar issues to these two: #360, #314.

Can you check wether the proposed solutions work for you?

All the best,

Seppe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants