FastaRecordReader for huge fasta files #31

jmabuin · 2019-03-11T09:19:41Z

Hi,

I have a question about the FastaRecordReader class data-algorithms-book/src/main/java/org/dataalgorithms/chap24/mapreduce/FastaRecordReader.java

I have been trying to use it for large genomes (fasta files much larger than a HDFS block, ie: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_genomic.fna.gz) but I am getting wrong sequences.

Is it possible that using this classes from Spark with newAPIHadoopFile method does not work for very large files? Or maybe am I missing something?

Regards, and thank you very much for your time.

Jose M. Abuin

mahmoudparsian · 2019-06-03T04:47:17Z

Hello Jose,
I will look into this and test it with your input.
Thanks,
best regards,
Mahmoud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastaRecordReader for huge fasta files #31

FastaRecordReader for huge fasta files #31

jmabuin commented Mar 11, 2019

mahmoudparsian commented Jun 3, 2019

FastaRecordReader for huge fasta files #31

FastaRecordReader for huge fasta files #31

Comments

jmabuin commented Mar 11, 2019

mahmoudparsian commented Jun 3, 2019