Using nemos to classify #311

vkonan · 2025-02-14T21:44:14Z

vkonan
Feb 14, 2025

My original question was posted here and evolved into this following question:

The experiment setup: wildtype (n=7) and knockout (n=8) animals received 33 odors and the neural responses (GCaMP6m) in olfactory bulb were recorded using 2-photon microscopy. My goal is to predict whether a trace from the calcium imaging recording (either deconvolved spike timestamps or raw calcium traces) is from a wild type animal or from a knockout animal. I know there are classifiers in scikit-learn that may do this but I'm using the data I have as an exercise to learn nemos better and understand what it can and cannot do.

My data structure is as follows:
A list of numpy arrays, each array containing data from one animal (data[0] is one animal), in the shape of <ROI x odors x time (in frames)>.

for example:
len(data_wt) = 8
data[0].shape = (125, 33, 85)

len(data_ko) = 7
data[0].shape = (176, 33, 85)

The number of ROIs differ from animal to animal, however the number of odors and the time dimension are the same. There are missing data for some of the odors (not all animals received all the odors). Contributors have already described how I should organize my data (see above link) so that it is compatible with nemos.

What is unclear is whether or not I can use nemos to analyze my data in the way I described earlier.

The reason that each array is a different size is that each animal's 2-photon recording yields a different number of ROIs (the first animals in the lists have 125 and 176 ROIs in wt and ko, respectively). Does each array have to be the same size or can I supply nemos with different sized arrays? If they need to be the same size, would the solution here be to pad the arrays with nans of the max number of ROIs?
What I want to accomplish with nemos is to use odors as a feature of the ROI activity (either deconvolved or raw calcium signal as in the demo Guillaume referenced) and predict whether the activity is coming from WT or from KO. I understand that nemos isn't a classifier, but since GLM can be used to predict a binomial distribution, can I not use the logit link function and do what I described?

BalzaniEdoardo · 2025-02-19T19:39:17Z

BalzaniEdoardo
Feb 19, 2025
Maintainer

Hello!
If I understood correctly you want to combine ROIs for different animals an genotypes to get a prediction. In that case padding plus concatenation would be equivalent to matching ROIs across subjects, which doesn't really make sense, because the signals come from different neurons.

Mapping things properly would be something like assuming that the activity of each subject comes from some sort of common manifold. Projecting each subject into that manifold would create more comparable features. At this point the hard part is not the classification but how to find an appropriate manifold.

For what it concerns the logit link, we would need a different observation model (Bernoulli, i.e. Binomial with 2 classes). We don't have it yet, but we are planing to add it. If you arrive at the point of applying a logistic regression, you can use scikit-learn and I can also help you set-up the model design once you have the abstract features I was talking about.

0 replies

BalzaniEdoardo · 2025-02-19T19:48:46Z

BalzaniEdoardo
Feb 19, 2025
Maintainer

Actually, I know a Pedro @pedroherrerovidal working on olfactory bulb that did a lot of cross-animal alignment, see this. He might have a better understanding of this problem.

Thanks in advance Pedro :)

0 replies

pedroherrerovidal · 2025-03-25T06:11:07Z

pedroherrerovidal
Mar 25, 2025

Hi @BalzaniEdoardo and @vkonan,

If I understood the question correctly, the final goal is to determine if signals coming from one animal belong to one of two underlying classes. If that is the case, I agree with Edo and we can think performing this task in a shared feature space. Before that, my conceptualization of the problem. High-level, you can think as your recordings as features that define a space where the genotype or target class/label can be separated. Now given known recordings and associated classes you want to train a model to predict class in new animals.

Brute force, one could take all neurons and time as features to train a classifier, but this wouldn't work because i) the features (neurons, timebins and conditions) are not matched across animals (different feature spaces), and ii) it would overlook known statistics of the signal (temporal correlations, conditioning on odor presented; different scales).

A slightly more complex approximation would be to engineer features from the original signal to train the classifier. Some of these could be the amplitude or synchrony of the signal (shared feature space). But, we know this will change depending on the odor and the animal, so we want to normalize (condition) based on signal and animal (note that normalizing/conditioning is critical for the classifier to work optimally; shared scale). However, this approach would discard or not model explicitly the variability in the data.

As Edo suggested, a more principled way of addressing this problem is defining a shared feature space (conditioned on animals and conditions) using latent space models that model temporal dynamics (take into consideration the temporal correlation of the signal); and then train a classifier in this space (shared feature space with same scale). Marginalizing over the animal, condition/odor and temporal variability would be best to achieve the best label/class/genotype classification.

There is already a mention to my work here, which could be a good reference on how to align spaces across animals. Then one would have to decode genotype, which could be done hierarchically or training a ML model on top. Different models make different assumptions, and it is good to understand which one fits your goal best. For some other references, you can look into:
•⁠ ⁠https://www.nature.com/articles/s41586-023-06714-0
•⁠ ⁠https://www.nature.com/articles/s41593-019-0555-4
•⁠ ⁠https://arxiv.org/pdf/2202.06159
•⁠ ⁠⁠https://www.nature.com/articles/s41586-023-06031-6
•⁠ ⁠⁠https://a-antoniades.github.io/Neuroformer_web/
•⁠ ⁠⁠https://arxiv.org/abs/2106.12570
•⁠ ⁠⁠https://www.biorxiv.org/content/10.1101/2020.01.27.922062v3.full.pdf

0 replies

BalzaniEdoardo · 2025-04-10T20:58:41Z

BalzaniEdoardo
Apr 10, 2025
Maintainer

@vkonan we added a Binomial observation model in the development branch, if you want to try that out!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using nemos to classify #311

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using nemos to classify #311

Uh oh!

vkonan Feb 14, 2025

Replies: 4 comments

Uh oh!

BalzaniEdoardo Feb 19, 2025 Maintainer

Uh oh!

BalzaniEdoardo Feb 19, 2025 Maintainer

Uh oh!

Uh oh!

pedroherrerovidal Mar 25, 2025

Uh oh!

BalzaniEdoardo Apr 10, 2025 Maintainer

vkonan
Feb 14, 2025

BalzaniEdoardo
Feb 19, 2025
Maintainer

BalzaniEdoardo
Feb 19, 2025
Maintainer

pedroherrerovidal
Mar 25, 2025

BalzaniEdoardo
Apr 10, 2025
Maintainer