Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dorado demux can't read custom barcodes fasta file, failed to extract sequences #1250

Open
PhilliVanilli opened this issue Feb 12, 2025 · 5 comments
Labels
barcode Issues related to barcoding documentation Improvements or additions to documentation

Comments

@PhilliVanilli
Copy link

PhilliVanilli commented Feb 12, 2025

Issue Report

Please describe the issue:

Please provide a clear and concise description of the issue you are seeing and the result you expect.

Dorado demux 0.9 can't access my custom barcode file, it outputs 'failed to extract sequences' though dorado returns the same error if you have a typo in the filename so I assume Dorado can't find it. If i use dorado 0.6 or 0.7 with same command it works perfectly
Dorado basecaller also works fine. Tried a toml and fasta file with same barcodes from a friend's computer where dorado demux works, but same error. So nothing wrong with command or with the custom files, nor with the paths. I tried dorado 0.9.1 but same issue. Can it be related with my system, ie during unpacking it makes links to certain files on my system? But I tried unpacking on our server (ubuntu 18) and on another Ubuntu 22 laptop and had the same issue.
All folders and files have rwx permissions

error message:

[2025-02-05 12:21:03.514] [info] Running: "demux" "--kit-name" "CUST" "-o" "/mnt/Data_SSD/test/demultiplexed" "--emit-fastq" "--barcode-both-ends" "--barcode-arrangement" "/home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/barcode_arrs_cust_dorado.toml" "--barcode-sequences" "/home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/barcodes_cust.fastq" "/mnt/Data_SSD/test/fastq"[2025-02-05 12:21:03.514] [info] num input files: 1[2025-02-05 12:21:03.528] [error] Failed to extract sequences from '/home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/barcodes_cust.fastq'.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.
download the tar.gz file, unpack, dorado download --model, try and demux fastq file

Run environment:

  • Dorado version: 0.9.0, downloaded the tar.gz file and unpacked
  • Dorado command: /home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/dorado demux --kit-name custom_barcodes -o /mnt/Data_SSD/test/demultiplexed --barcode-arrangement /home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/barcode_arrs_cust_dorado.toml --barcode-sequences /home/pselhorst/metatropics_dev/dorado-0.9.0-linux-x64/bin/barcodes_cust.fasta --emit-fastq --barcode-both-ends /mnt/Data_SSD/test/fastq
  • Operating system: Ubuntu
  • Hardware (CPUs, Memory, GPUs): RTX2070
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):fastq folder with one fastq file
  • Source data location (on device or networked drive - NFS, etc.): on same device as dorado
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

  • Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
@malton-ont
Copy link
Collaborator

Hi @PhilliVanilli,

There appears to be a typo in the docs - the custom barcode file must be a FASTA format. This is correctly shown in the example, but the format name has been mis-stated.

@malton-ont malton-ont added documentation Improvements or additions to documentation barcode Issues related to barcoding labels Feb 13, 2025
@PhilliVanilli
Copy link
Author

Hi @malton-ont,

Thanks so much for your quick reply. The custom barcode file is in fasta format (it is just the name that says fastq), but I've also tried with other barcode files (fasta in name and format, that's why the dorado cmd above says fasta) that work on other people's computer but somehow not on mine. To me there seems to be something deeper going on with v0.9 as the same files and command work for dorado 0.6/0.7/0.8 but suddenly stop working for 0.9. I feel it has something to do with permissions or something else on our computers (they are not managed by our institute but personal computers) but not sure what (even with chmod -R 777 it doesn't work). I'm a bit afraid that otherwise we will get stuck with dorado 0.8 forever :-)

Thanks
p

@malton-ont
Copy link
Collaborator

@PhilliVanilli,

Dorado changed its custom file parsing between 0.8.3 and 0.9.0, so I suspect there's something in there that is triggering this - the new parser is possibly stricter than the old one. Things to check regarding your sequences file:

  • It's in fasta format
>Barcode_01
ACGT
  • No space between the > and the name
  • No empty lines
  • Sequence does not contain both T and U bases

You should also check that the sequences file doesn't contain Windows-style line-endings since you're running on Linux.

@PhilliVanilli
Copy link
Author

@malton-ont

The file is in fasta format and works for all three previous dorados which also required fasta and I tried a barcode file from another computer where dorado 0.9 accepts this barcode file. So the issue can't be related to the actual file. The file has never been on a windows computer and I checked the line endings cause that's indeed often an issue

@malton-ont
Copy link
Collaborator

@PhilliVanilli,

As I said, the fasta validation has been tightened up in 0.9.0. Are you able to share the file?

If the same file works on another machine with the same version of dorado, then this isn't a dorado issue - it would have to be something in your system, and you will need to talk to your IT support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
barcode Issues related to barcoding documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants