Getting mesoscale calcium imaging into nwb format #29

DaveOC90 · 2022-05-09T17:35:46Z

DaveOC90
May 9, 2022

Hi All,

I would like to convert a dual wavelength one photon mesoscale calcium imaging into nwb. I have two questions in particular:

I do not know what class to use for the data. It seemed like an image or optical series class object might be best for these, but I realize these are non-specific. Perhaps I can create a one photon class that inherits from the two photon? I am not familiar with two photon so I dont know how appropriate this would be.
I also have concerns about how to aggregate the data into one sessions file, as we have upwards of 36GB of imaging data per session, and in some cases triple that. Does nwb generally advocate for really large files (10s of GBs)? I think the computers we have may struggle with this, and it may make transferring the data more difficult.

Some data details below:

Data Description
The data are currently in nifti format, and in a BIDs like organization, that is to say, they are organized by subject id, session, and run (afaik there is no bids standard for these type of data yet). The data are 2D x Time. We have at least 6 runs/scans per session, and around 31 sessions. The 10 minute runs can yield ~5.5 -6 GB of data so they are usually split into three tif files when output (we then converted to nii). A further complication is that they were acquired using a dual wavelength protocol, which means signal sensitive frames were interleaved with signal insensitive frames, We have split each tif file into two separate nii files. So for now we have 6 image files (as nii) per run, and at least 6 runs per session (in some cases > 12), and 31 sessions across 23 subjects.

Thanks for any help you can offer.

Dave

CodyCBakerPhD · 2022-05-09T17:56:26Z

CodyCBakerPhD
May 9, 2022

Hi again Dave!

I'm hoping someone else might have input on question 1.

For #2...

I also have concerns about how to aggregate the data into one sessions file, as we have upwards of 36GB of imaging data per session, and in some cases triple that. Does nwb generally advocate for really large files (10s of GBs)? I think the computers we have may struggle with this, and it may make transferring the data more difficult.

We do generally advocate that raw data should always be included whenever possible. This can indeed produce very large individual individual files, but both NWB supported APIs have data tools for handling such operations. The DANDIArchive also has very high-performance upload/download operations, as well as a streaming functionality that allows you to interact with any file on the archive as if it were on your system, but only returns portions of data on an 'as-needed' basis.

I've personally used all of these tools to handle individual files as big as 372GB and datasets in the tens of TB (which DANDI hosts completely free of charge), so if you have specific questions on how to do certain things for this please don't hesitate to reach out at any time.

Are you thinking of using Python or MATLAB to perform your conversion?

Both APIs support data tools to manage this quantity of data: one example would be the PyNWB tutorial on iterative data write. This would only load a small but specifiable amount of data into memory at any one point in time, and would iteratively write the entire series of data to the NWBFile in that manner.

I also ask, because if you use Python, we've already implemented some automated tools for doing these sort of advanced data engineering things over on the NWB Conversion Tools project. Specific to TIFF stacks, one should be able to simply construct a little script like so

from datetime import datetime
from dateutil import tz
from pathlib import Path

from nwb_conversion_tools import TiffImagingInterface

file_path = "my_tiff_file.tif"  # point to a single tiff file
sampling_frequency = ??  # need to specify sampling rate in Hz of the images

# Change the file_path to the location in your system
interface = TiffImagingInterface(file_path=file_path, sampling_frequency=sampling_frequency)

# Extract what metadata we can from the source files
metadata = interface.get_metadata()
# session_start_time is required for conversion. If it cannot be inferred
# automatically from the source files you must supply one.
session_start_time = datetime(2020, 1, 1, 12, 30, 0, tzinfo=tz.gettz("US/Pacific"))  # specify timezone and session start time here
metadata["NWBFile"] = dict(session_start_time=session_start_time)

 # Choose a path for saving the nwb file and run the conversion
save_path = f"{path_to_save_nwbfile}"  # This should be something like: "./saved_file.nwb"
interface.run_conversion(save_path=save_path, metadata=metadata)

which will technically write them as a 2-photon series, but we'd be happy to extend that to whatever the conclusion of #1 happens to be. (we should also allow a list of tiff files to be written to the same series, I'll bring that up over there right now)

Cheers,
Cody Baker, PhD
Sr. Neuro-data Scientist at CatalystNeuro

2 replies

DaveOC90 May 9, 2022
Author

Quick follow up, I get an error with TiffImagingInterface; ValueError: image data are not memory-mappable did a bit of googling but could not figure out precisely what was causing this.

As a work around, what do you think of creating the NWBFile with links to external filepaths similar to here?: https://pynwb.readthedocs.io/en/stable/tutorials/domain/ophys.html#two-photon-series

I did this and it passed nwbinspector. However I tried to upload to dandi and it failed as the files were not a movie format (.mp4, .avi, .wmv, .mov, .flv, .mkv) This maybe be a question to post on dandi helpdesk, but can there be support for tif external files (as in the tutorial above) or even nifti? Would mean I could upload right away (I think).

Dave

CodyCBakerPhD May 9, 2022

As a work around, what do you think of creating the NWBFile with links to external filepaths similar to here?

I did this and it passed nwbinspector. However I tried to upload to dandi and it failed as the files were not a movie format (.mp4, .avi, .wmv, .mov, .flv, .mkv) This maybe be a question to post on dandi helpdesk, but can there be support for tif external files (as in the tutorial above) or even nifti? Would mean I could upload right away (I think).

Thanks for using the NWBInspector! That is important as it would have checked that such external files are setup correctly as relative paths to the NWBFile.

Unfortunately, DANDI does not in general allow you to upload non-NWB files alongside for the use of external links (you found the exception for videos, however). You are welcome to submit a request for that as a feature on their issues thread for the archive
, but I think they would still prefer it to be converted into the NWBFile so that the data can be compressed losslessly (to save the archive storage space as well as increase download/upload speeds) while still allowing individual portions of the data to be accessed easily.

Quick follow up, I get an error with TiffImagingInterface; ValueError: image data are not memory-mappable did a bit of googling but could not figure out precisely what was causing this.

A hah, this one is indeed familiar to us: we had merged a fix to roiextractors (catalystneuro/roiextractors#99) a while back for this, but our release cycle is waiting on other upstream things to snap into place. If you do

pip install git+https://github.com/catalystneuro/roiextractors@master

then I do hope that should work, if not please let us know the full traceback on your particular files on the roiextractors issue thread

DaveOC90 · 2022-05-09T18:48:42Z

DaveOC90
May 9, 2022
Author

Hi again Cody!

Thank you for the detailed response. I am definitely going to share the raw data, preprocessed may follow but that is a work in progress. I have no doubt dandi can handle these type of things, I am more concerned about our local infrastructure. In any case I will try to make the files. I am going to use python, so the package you detailed looks really useful. Ill give it a shot and keep an eye on the issue you raised about loading/writing multiple tiffs. I have also found a calcium imaging specific tutorial on pynwb that I think I can mimic with some editing to meet the requirements (https://pynwb.readthedocs.io/en/stable/tutorials/domain/ophys.html). I can mock things up with twoPhotonSeries and pivot if there is a better solution.

Dave

0 replies

CodyCBakerPhD · 2022-06-13T15:26:10Z

CodyCBakerPhD
Jun 13, 2022

Hey again @DaveOC90, it looks like what we'll have to do for the one photon data is make a new NWB data type for it - this work began a couple years ago but fell through the cracks: NeurodataWithoutBorders/nwb-schema#283

I'll be taking it over now to try to get this done once and for all.

It would be a great help if you could take a look at what they proposed back in the day and see if it works for you.

Also, not required but would be enormously helpful to this process - if you would be willing to share a single session of the one-photon data with me while I work on this, and possibly meet to go over any questions I might have?

6 replies

CodyCBakerPhD Jun 13, 2022

Good to hear! Google Drive/Dropbox/Box/Globus, anything like that works for data sharing. You can add me via my email, [email protected]

CodyCBakerPhD Jun 20, 2022

@DaveOC90 I've made a Google Drive folder that we can use for this: https://drive.google.com/drive/folders/1eSN8DdC7BZpTAffydihxw3V4JTOtZGTQ?usp=sharing

Let me know if you're able to access it

CodyCBakerPhD Jun 28, 2022

Hi @DaveOC90, are you still able to help this new addition by sharing some example data?

DaveOC90 Jun 29, 2022
Author

I have uploaded 20mins of data to the above folder. The imaging data is in tiff format (as it comes off the imaging rig) and there are some ancillary files that are needed to process the data (smr is the raw format, also provided .mat files in case you dont have Spike2). I'd be happy to walk you through the data more thoroughly and/or upload other formats. I have made a first attempt at sharing the whole dataset here in nwb format (albeit with the two photon class): https://dandiarchive.org/dandiset/000244/draft

CodyCBakerPhD Jun 29, 2022

Fantastic! Thanks a bunch, I'll look through all this and come back to you with any questions

CodyCBakerPhD · 2022-07-07T15:53:12Z

CodyCBakerPhD
Jul 7, 2022

@DaveOC90

Hey Dave,

I'm finishing up work on the schema now, and I'd love to go over it with you one-on-one!

Whenever you have the time, feel free to schedule something automatically via my calendly.

Looking forward to meeting in person,
Cody

1 reply

DaveOC90 Jul 12, 2022
Author

Cool, scheduled a meeting there.

k1o0 · 2023-05-15T09:40:49Z

k1o0
May 15, 2023

Hello Cody,

We're also acquiring mesoscale imaging data and would like to would like to store raw ScanImage tif files in a remote file repository, losslessly compressed. We would also like to have the ability to stream small numbers of frames in case we ever need to run quality control or test new ROI detection algorithms on the data

I understand that NWB allows one to store the raw files in a compressed form, and allows random access of individual files (correct me if I'm wrong here), however it's not clear to me whether it is possible to stream individual frames from a compressed tif with this API. Perhaps you could clarify for us?

2 replies

CodyCBakerPhD May 15, 2023

All of those features are easily possible with NWB.

For easily converting ScanImage TIFF format to NWB, check out NeuroConv

The lossless compression is applied automatically through NeuroConv so you never have to worry about it. If you decide not to use NeuroConv, check out this PyNWB tutorial to enable compression options

The compression is applied to each 'chunk' of the dataset, which is also a controllable parameter

Ideal chunk sizes recommended from the HDF5 team are about ~1 MB in size; again, if you use NeuroConv, I do believe the automagic chunk shape setting fits as many frames as it can into that approximate size (assuming a single frame is smaller than 1 MB) and for higher resolutions chunks each frame itself into smaller pieces

For streaming, check out the PyNWB tutorial; when you execute a data slice operation such as my_imaging_data.data[frame_index, :, :] only the chunks that belong to that frame are sent over the bandwidth and decompressed on the fly

If using the fsspec mode of streaming, it is also easy to enable caching so that repeated data access attempts read from disk after the initial data transfer

k1o0 Jun 9, 2023

Great, thanks for clarifying!

Getting mesoscale calcium imaging into nwb format #29

Uh oh!

Replies: 5 comments · 11 replies

Uh oh!

Uh oh!

Uh oh!

DaveOC90 May 9, 2022 Author

Uh oh!

Uh oh!

DaveOC90 May 9, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveOC90 Jun 29, 2022 Author

Uh oh!

Uh oh!

Uh oh!

DaveOC90 Jul 12, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 5 comments 11 replies

DaveOC90 May 9, 2022
Author

DaveOC90
May 9, 2022
Author

DaveOC90 Jun 29, 2022
Author

DaveOC90 Jul 12, 2022
Author