Collective Matrix Factorization in Python.
Collective Matrix Factorization is a machine learning method that decomposes two matrices
and
into three matrices
,
, and
such that
where is either the identity or sigmoid function.
CMF decomposes complex and multiple relationships into a small number of components, and can provide valuable insights into your data. Relationships between
- words, documents, and sentiment
- people, movies, genres, and ratings
- items, categories, people, and sales
and many more can all be handled with this simple framework. See Use Cases for more details.
PyCMF implements a scikit-learn like interface (full compatibility with scikit-learn is currently in progress)
>>> import numpy as np
>>> import pycmf
>>> X = np.abs(np.random.randn(5, 4)); Y = np.abs(np.random.randn(4, 1))
>>> model = pycmf.CMF(n_components=4)
>>> U, V, Z = model.fit_transform(X, Y)
>>> np.linalg.norm(X - U @ V.T) / np.linalg.norm(X)
0.00010788067541423165
>>> np.linalg.norm(Y - V @ Z.T) / np.linalg.norm(Y)
1.2829730942643831e-05
$ pip install git+https://github.com/smn-ailab/PyCMF
Numpy and Cython must be installed in advance.
- Support for both dense and sparse matrices
- Support for linear and sigmoid transformations
- Non-negativity constraints on the components (useful in use cases like topic modeling)
- Stochastic estimation of the gradient and Hessian for the newton solver
- Visualizing topics and importances (see
CMF.print_topic_terms
)
See the docstrings for more details on how to configure CMF.
See samples for working examples. Possible use cases include:
Suppose you want to do topic modeling to explore the data, but want to use supervision signals such as toxicity, sentiment, etc.. By using CMF, you can extract topics that are relevant to classifying texts.
Many prediction tasks involve relations between multiple entities. Movie rating prediction is a good example: common entities include users, movies, genres and actors. CMF can be used to model these relations and predict unobserved edges.
This project is licensed under the MIT License - see the LICENSE file for details
- Improve performance
- Add support for weight matrices on relations
- Add support for predicting using obtained components
- Full compatibility with sklearn