This package provides an implementation of NMF with performance comparable to
randomized PCA on sparse data, e.g. arising from scRNA-seq of
The key advances are: (1) the use of amortized inference, which allows the model to be fit on a GPU using minibatch SGD; (2) support for sparse matrices, which allows the entire data matrix to be stored on GPU memory.
pip install https://www.github.com/aksarkar/anmf.git#egg=anmf
import anmf
import scipy.sparse as ss
import torch.data.utils as td
# This matrix has shape (593844, 16002) and is 99% sparse
x = ss.load_npz('immune-cell-census.npz')
# This automatically moves to the GPU if available
data = anmf.ExprDataset(x)
loader = td.DataLoader(data, batch_size=32, collate_fn=data.collate_fn)
model = anmf.ANMF(input_dim=x.shape[1], hidden_dim=128, latent_dim=10)
model.fit(loader, max_epochs=10)
# In amortized inference, we learn a function from data to loadings
l = model.loadings(x)
f = model.factors