ValueError upon unexpected scan title format #354

MNTsnowman · 2024-07-07T13:45:20Z

Hi Casanovo

This is the first time i'm attempting to use casanovo, i have tried to follow your guide at : https://casanovo.readthedocs.io/en/latest/getting_started.html

I'm getting this error (see below). I'm wondering if it could have something to do with the headders of the scans in the mzML files, if this sounds like a possibility, could you please provide the command line settings you guys are using for generating the mzML files and how you name and structure the headder?

 D:\...\De Novo>casanovo sequence -m WorkDir\casanovo_massivekb.ckpt -c WorkDir\casanovo_config.yaml Data\mzML\14-2-NM_S4-A1_1_9156.mzML
WARNING: Dataloader multiprocessing is currently not supported on Windows or MacOS; using only a single thread.
Seed set to 454
INFO: Casanovo version 4.2.1
INFO: Sequencing peptides from:
INFO:   Data\mzML\14-2-NM_S4-A1_1_9156.mzML
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
INFO: Reading 1 files...
Data\mzML\14-2-NM_S4-A1_1_9156.mzML: 100%|█████████████████████████████████| 27193/27193 [00:32<00:00, 835.91spectra/s]
WARNING: Skipped 25714 spectra with invalid precursor info
Traceback (most recent call last):
  File "C:\Users\...\casanovo_env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\...\casanovo_env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\casanovo_env\Scripts\casanovo.exe\__main__.py", line 7, in <module>
  File "C:\Users\...\casanovo_env\lib\site-packages\rich_click\rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "C:\Users\...\casanovo_env\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\...\casanovo_env\lib\site-packages\rich_click\rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "C:\Users\...\casanovo_env\lib\site-packages\click\core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\...\casanovo_env\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\...\casanovo_env\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\...\casanovo_env\lib\site-packages\casanovo\casanovo.py", line 143, in sequence
    runner.predict(peak_path, output)
  File "C:\Users\...\casanovo_env\lib\site-packages\casanovo\denovo\model_runner.py", line 160, in predict
    test_index = self._get_index(peak_path, False, "")
  File "C:\Users\...\casanovo_env\lib\site-packages\casanovo\denovo\model_runner.py", line 394, in _get_index
    return Index(index_fname, filenames, valid_charge=valid_charge)
  File "C:\Users\...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 104, in __init__
    self.add_file(ms_file)
  File "C:\Users\...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 195, in add_file
    metadata = self._assemble_metadata(parser)
  File "C:\Users\...\casanovo_env\lib\site-packages\depthcharge\data\hdf5.py", line 173, in _assemble_metadata
    metadata["scan_id"] = parser.scan_id
ValueError: could not broadcast input array from shape (0,) into shape (25714,)

The text was updated successfully, but these errors were encountered:

bittremieux · 2024-07-07T14:57:01Z

I suspect that all of the spectra were skipped:

WARNING: Skipped 25714 spectra with invalid precursor info

You already indicated that you suspected something wrong with the scan headers. Did you modify them in some way?

Normally standard mzML files produced by MSConvert, ThermoRawFileParser, etc. should all work. We do not edit the mzML files or the headers in there at all.

MNTsnowman · 2024-07-07T19:11:43Z

Hi @bittremieux

Yes i suspect the headders as my data orriginates from a timsTOF with the IM engaged. I don't think that the IM is to blame as it is handeled in the conversion (see command below). Given that the data is from a timsTOF I do not think the ThermoRawFileParser is used at all.

For info, the CMD command i use to generate the mzML files is something along the lines of this : "C:\Users...\ProteoWizard 3.0.23167.44089af 64-bit\msconvert.exe" --combineIonMobilitySpectra --filter "peakPicking vendor msLevel=1-" --filter "scanSumming precursorTol=0.05 scanTimeTol=5 ionMobilityTol=0.1 sumMs1=0" --filter "titleMaker ... File:"""^<SourcePath^>""", NativeID:"""^<Id^>""""

So given that it skips all the scans, and that it states that the precursor info is invalid, i was wondering what your settings were to generate the scan title, in other words what is your "titlemaker" part of your conversion command. I hope this makes sense. Also, please let me know if you have other suggestions for what could be wrong. :)

bittremieux · 2024-07-07T19:46:14Z

I have limited hands-on experience with timsTOF conversion to mzML, so I don't know how the titleMaker filter should be used. But I'd be surprised if that's the problem. I suspect something about the IM actually.

Can you share the mzML file here to have a look at?

MNTsnowman · 2024-07-07T21:41:59Z

Unfortunately I'm unable to share a file here. If you have an E-mail we could continue the conversation over we could maybe figure something out.

Alternatively I could try to compare the headers of your demo data with my data.

bittremieux · 2024-07-08T06:33:19Z

You can email me at [email protected].

bittremieux · 2024-08-22T10:50:33Z

Ok, the issue is that the scan titles in your mzML file are in the format merged=XX frame=XX scanStart=XXX scanEnd=XXX, whereas the DepthCharge parser expects a single scan number indicated by scan=XXX. The latter is ok when working with Thermo data, which is what we've mostly been doing so far. But of course not all scan titles are formatted that way, and PASEF is then an even slightly more special case.

The good news is that should be resolved by the pending DepthCharge upgrade (#350). Until that is fully integrated, I'll keep this issue open so that we can double-check that it gets fixed.

As a workaround for now, is it possible to modify the titleMaker filter? Alternatively, converting to MGF should also work, because for MGF we don't try to extract scan information from the spectrum title.

MNTsnowman · 2024-08-23T12:00:24Z

Hi @bittremieux

Thanks a lot for getting back to me and keeping me up to date. :)

I think the solution is in the headder, thus, for now i will leave it and await you update and or solution in #369 to work. If i may add one suggestion to the process though, it is this; please read the documentation for the "titleMaker" and its commands/syntax in msconvert (https://proteowizard.sourceforge.io/tools/msconvert.html), for you guys it could be advantageous to define a format that supports timsTOF and equiptment with those commands in mind. The result should be a command resembling what i showed above. The bonus here is that you can then add that example command to your readme wherefrom others can find the information as well.

Regarding mgf files, yes it's an option, and that worked when i tested it. However, from mgf files i am unable to estimates inteseties and thus abundances.

Thanks a lot for the tool and keep up the good work, i'll be keeping an eye on it. :)

bittremieux added the bug Something isn't working label Jul 7, 2024

bittremieux linked a pull request Aug 22, 2024 that will close this issue

Flexible format for scan titles #369

Draft

bittremieux changed the title ~~ValueError: could not broadcast input array from shape (0,) into shape (25714,)~~ ValueError upon unexpected scan title format Aug 22, 2024

bittremieux linked a pull request Aug 22, 2024 that will close this issue

Flexible format for scan titles #369

Draft

bittremieux mentioned this issue Sep 17, 2024

.d to mgf conversion for de novo #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError upon unexpected scan title format #354

ValueError upon unexpected scan title format #354

MNTsnowman commented Jul 7, 2024 •

edited by bittremieux

Loading

bittremieux commented Jul 7, 2024 •

edited

Loading

MNTsnowman commented Jul 7, 2024

bittremieux commented Jul 7, 2024

MNTsnowman commented Jul 7, 2024

bittremieux commented Jul 8, 2024

bittremieux commented Aug 22, 2024 •

edited

Loading

MNTsnowman commented Aug 23, 2024

ValueError upon unexpected scan title format #354

ValueError upon unexpected scan title format #354

Comments

MNTsnowman commented Jul 7, 2024 • edited by bittremieux Loading

bittremieux commented Jul 7, 2024 • edited Loading

MNTsnowman commented Jul 7, 2024

bittremieux commented Jul 7, 2024

MNTsnowman commented Jul 7, 2024

bittremieux commented Jul 8, 2024

bittremieux commented Aug 22, 2024 • edited Loading

MNTsnowman commented Aug 23, 2024

MNTsnowman commented Jul 7, 2024 •

edited by bittremieux

Loading

bittremieux commented Jul 7, 2024 •

edited

Loading

bittremieux commented Aug 22, 2024 •

edited

Loading