Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About kallisto files #14

Open
wososa opened this issue Nov 16, 2019 · 12 comments
Open

About kallisto files #14

wososa opened this issue Nov 16, 2019 · 12 comments

Comments

@wososa
Copy link

wososa commented Nov 16, 2019

Hi Dr. Zhang,

I am trying the following commend to run DARTS:

Darts_DNN build_feature -i bayes_infer/A5SS.darts_bht.flat.txt -c ~/.darts/DNN/v0.1.0/trainedParam/A5SS-trainedParam-EncodeRoadmap.h5 -e Sample_WT_kallisto Sample_KD_kallisto -o A5SS_data.h5 --t A5SS

I got the following error message:
2019-11-16 10:14:12,982 - Darts_DNN.build_feature - INFO - convert tx to gene TPM Traceback (most recent call last): ...skip... KeyError: 'ENST00000631435'

Does this mean that I am using the wrong files (or wrong version of gene annotation) from kallisto?

Files in the kallisto folder (based on Ensemble v96):
abundance.h5 abundance.tsv run_info.json

Thanks,
Woody

@zj-zhang
Copy link
Collaborator

zj-zhang commented Nov 18, 2019 via email

@wososa
Copy link
Author

wososa commented Nov 18, 2019

Hi Dr. Zhang,

Thanks for your quick reply. I proceeded with gencode v19, installed the python module "tables", and found the follow error:

`
/Darts/RBP_tpm.txt
.. read sequence feature
Traceback (most recent call last):
File "/anaconda3/envs/darts/bin/Darts_DNN", line 4, in
import('pkg_resources').run_script('Darts-DNN==0.1.0', 'Darts_DNN')
File "/anaconda3/envs/darts/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/anaconda3/envs/darts/lib/python2.7/site-packages/pkg_resources/init.py", line 1460, in run_script
exec(script_code, namespace, namespace)
File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/EGG-INFO/scripts/Darts_DNN", line 192, in

File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/EGG-INFO/scripts/Darts_DNN", line 49, in main

File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/Darts_DNN/Darts_build_feature.py", line 157, in parser
File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/Darts_DNN/Darts_build_feature.py", line 98, in make_single_table
File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/Darts_DNN/utils.py", line 326, in read_sequence_feature
File "/anaconda3/envs/darts/lib/python2.7/site-packages/pandas/io/pytables.py", line 377, in read_hdf
raise ValueError('No dataset in HDF5 file.')
ValueError: No dataset in HDF5 file.
`

RBP_tpm.txt has been generated sucessfully. hd5 file wasn't generated. Could you elaborate more on this error?

Thanks,
Woody

@zj-zhang
Copy link
Collaborator

zj-zhang commented Nov 19, 2019

@wososa Please use predict directly without build_features. build_features is a legacy sub-command that took more disk usage and would be discarded in the future. Please follow an usage example here, in case it's helpful to pinpoint further issues:
https://darts-dnn.readthedocs.io/en/latest/#using-predict
I have updated the README.md to avoid future confusions.

@wososa
Copy link
Author

wososa commented Nov 19, 2019

@zj-zhang Thanks for your reply. build_features is needed to produce RBP_tmp.txt, right? It seems that I need RBP_tmp.txt file to run predict function.

@zj-zhang
Copy link
Collaborator

@wososa Not necessarily, actually. For example, you can run predict directly like so:

Darts_DNN predict -i darts_flat/Sp_out.txt \
-o darts_pred.txt \
-e kallisto/Day5_rep1/,kallisto/Day5_rep2/,kallisto/Day5_rep3/ kallisto/No_Dox_rep1/,kallisto/No_Dox_rep2/,kallisto/No_Dox_rep3/

It was illustrated in the help message by running Darts_DNN predict with -h option:

$ Darts_DNN predict -h
usage: Darts_DNN predict [-h] -i INPUT -o OUTPUT [-t {SE,A5SS,A3SS,RI}]
                         [-e EXPR [EXPR ...]] [-m MODEL]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT              Input feature file (*.h5) or Darts_BHT output (*.txt)
  -o OUTPUT             Output filename
  -t {SE,A5SS,A3SS,RI}  Optional, default SE: specify the alternative splicing
                        event type. SE: skipped exons, A3SS: alternative 3
                        splice sites, A5SS: alternative 5 splice sites, RI:
                        retained introns
  -e EXPR [EXPR ...]    Optional, required if input is Darts_BHT output;
                        Folder path for Kallisto expression files; e.g '-e
                        Ctrl_rep1,Ctrl_rep2 KD_rep1,KD_rep2'
  -m MODEL              Optional, default using current version model in user
                        home directory: Filepath for a specific model
                        parameter file

Hope this helps.

@zj-zhang
Copy link
Collaborator

In fact, in case it might be potentially useful for others, let me add that using predict directly is currently the encouraged way to using Darts_DNN :) Thanks again @wososa

@wososa
Copy link
Author

wososa commented Nov 19, 2019

I can understand now. Thanks!

@wososa
Copy link
Author

wososa commented Dec 11, 2019

@zj-zhang My A5SS.darts_bht.flat.txt has 5,084 records, but the A5SS_pred.txt file only has 36 records. Any idea why many of the records are lost during the Darts_DNN predict step?

@zj-zhang
Copy link
Collaborator

@wososa Most likely it's because the majority of the A5SS in your file does not have pre-compiled cis-sequence features. Could you check the ID overlapping between A5SS.darts_bht.flat.txt and $HOME/.darts/DNN/v0.1.0/cisFeature/A5SS.norm.txt.gz?

@wososa
Copy link
Author

wososa commented Dec 12, 2019

@zj-zhang Thanks for your quick reply. If the number of overlapping events is small, does it mean that my A5SS events are new to the gencode annotation? I probably can't process the big amount of RNA-seq datasets in DARTS-DNN to re-generate the features.

@zj-zhang
Copy link
Collaborator

Yes if number of overlapping events is small, that means the A5SS events are likely novel events specific in your RNA-seq data. The sequence features were compiled by @zcpan ; If that's indeed the case, I will open a new issue for that so we could better keep track.

@astulaaa
Copy link

I am not too sure what went wrong but appears that Darts_DNN is not recognizing input directory supplied with -e parameter
I ran darts_DNN the way was suggested: Darts_DNN predict -i A5SS.darts_bht.flat.converted_hg19.txt -e /Genotypes/tmp/DARTS_RNA/RealRun/CHR17Run/kallisto/output_KU/ -o predA5SS.txt -t A5SS

constructing in-memory feature matrix
Traceback (most recent call last):
File "/anaconda3/envs/darts/bin/Darts_DNN", line 4, in
import('pkg_resources').run_script('Darts-DNN==0.1.0', 'Darts_DNN')
File "/anaconda3/envs/darts/lib/python2.7/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/anaconda3/envs/darts/lib/python2.7/site-packages/pkg_resources/init.py", line 1469, in run_script
exec(script_code, namespace, namespace)
File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/EGG-INFO/scripts/Darts_DNN", line 192, in

File "/anaconda3/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/EGG-INFO/scripts/Darts_DNN", line 44, in main

File "/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/Darts_DNN/Darts_pred.py", line 103, in parser
File "/envs/darts/lib/python2.7/site-packages/Darts_DNN-0.1.0-py2.7.egg/Darts_DNN/utils.py", line 285, in construct_training_data_from_label
Exception: this file is not found: /Genotypes/tmp/DARTS_RNA/RealRun/CHR17Run/kallisto_Fasta/output_KU

Any suggestions how to sort this out?
It would be really helpful if standarized liftover (hg38->hg19) and standardized pred file generation could be added to the manual. Right now seems that kallisto ran well without any errors, all 3 output files were produced (abundance.h5, abundance.tsv, run_info.json), why this input was not suitable? Could this error be originating from A5SS.darts_bht.flat.converted_hg19.txt file by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants