Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in gmmtest #10

Open
JannesSP opened this issue Oct 7, 2022 · 5 comments
Open

error in gmmtest #10

JannesSP opened this issue Oct 7, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@JannesSP
Copy link

JannesSP commented Oct 7, 2022

2022-10-07 12:37:26,831 WARNING Default min depth set to 6 to match window size 3
2022-10-07 12:37:26,836 INFO Running gmmtest in 3-comp GMM (uniform outliers) mode with 1 control datasets and 1 treatment datasets
2022-10-07 12:37:26,839 INFO 1 genes to be processed on 1 workers
Traceback (most recent call last):
File "/home/yi98suv/anaconda3/envs/yanocomp/bin/yanocomp", line 8, in
sys.exit(cli())
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass
return cmd(dynamic_dataclass(cls_name, bases=bases, **cli_kwargs))
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 333, in gmm_test
res, sm_preds = parallel_test(opts)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 210, in parallel_test
res, sm_preds = test_chunk(
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 155, in test_chunk
chrom, strand = load_gene_attrs(gene_id, cntrl_h5)
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/io.py", line 291, in load_gene_attrs
chrom = g.attrs['chrom']
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 60, in getitem
attr = h5a.open(self._id, self._e(name))
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5a.pyx", line 77, in h5py.h5a.open
KeyError: "Can't open attribute (can't locate attribute: 'chrom')"

Any idea how to fix this or where the error is coming from?

@mparker2
Copy link
Member

mparker2 commented Oct 7, 2022

Hi @JannesSP,

Looks like there is an issue with the format of the hdf5 file, specifically that there is not a chromosome attribute saved for the gene for some reason. Can you please describe how you ran yanocomp prep & yanocomp gmmtest. I noticed that you are only testing one gene, is this a viral RNA?

BW
Matt

@JannesSP
Copy link
Author

JannesSP commented Oct 7, 2022

Hey @mparker2,

exactly, I am testing it on viral RNA - so only one "chromosome" to which I aligned/mapped my reads and ran nanopolish eventalign on.
I executed yanocomp prep with the command like this:
yanocomp prep -e nanopolish_eventalign.tsv -h yanocomp_prep.hdf5 -p 12
And for yanocomp gmmtest I currently only have one control and one test sample.

Kind regards,
Jannes

@JannesSP
Copy link
Author

Hey @mparker2

Found the problem:
The reference Fasta file I downloaded from a database had the '/' character in the header, which shows up as the contig in the nanopolish eventalign .tsv.
The h5py API is interpreting the '/' as a group separation character when creating the hdf5 (datasets, groups etc.) in the yanocomp prep.
This is why multiple unwanted groups were created in the hdf5 file and the gmmtest could not find the 'chrom' attribute.
Maybe you can somehow catch this error, replace this character in the code or tell the user to check their Fasta headers for this character (gene_id).

Kind regards,
Jannes

@mparker2
Copy link
Member

aha! well done for figuring that out. Sorry I didn't get any time to help...
I should really sanitise the strings used to create all attributes better. WIll mark this as a bug to get around to! Many thanks for reporting it

Matt

@mparker2 mparker2 added the bug Something isn't working label Oct 13, 2022
@kwonej0617
Copy link

kwonej0617 commented May 14, 2023

Hi @mparker2
Because I got the same error message as @JannesSP, I checked the header of my fasta file whether the header include "/", making the error.

Here is the first line of my fasta file. There is no "/' at the very top of the file.
image

However, I found that in the middle of the file, I found the lines that include "/", Would it make the problem for the error? If so, how can I modify my fasta file to generate yanocomp gmmtest mode without the error?
image

I am looking forward to hearing from you.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants