PyCMF

Collective Matrix Factorization in Python.

Collective Matrix Factorization is a machine learning method that decomposes two matrices and into three matrices , , and such that

where is either the identity or sigmoid function.

Why Use CMF?

CMF decomposes complex and multiple relationships into a small number of components, and can provide valuable insights into your data. Relationships between

words, documents, and sentiment
people, movies, genres, and ratings
items, categories, people, and sales

and many more can all be handled with this simple framework. See Use Cases for more details.

Usage

PyCMF implements a scikit-learn like interface (full compatibility with scikit-learn is currently in progress)

>>> import numpy as np                                                                                          
>>> import pycmf
>>> X = np.abs(np.random.randn(5, 4)); Y = np.abs(np.random.randn(4, 1))
>>> model = pycmf.CMF(n_components=4)
>>> U, V, Z = model.fit_transform(X, Y)
>>> np.linalg.norm(X - U @ V.T) / np.linalg.norm(X)
0.00010788067541423165
>>> np.linalg.norm(Y - V @ Z.T) / np.linalg.norm(Y)
1.2829730942643831e-05

Getting Started

$ pip install git+https://github.com/smn-ailab/PyCMF

Numpy and Cython must be installed in advance.

Features

Support for both dense and sparse matrices
Support for linear and sigmoid transformations
Non-negativity constraints on the components (useful in use cases like topic modeling)
Stochastic estimation of the gradient and Hessian for the newton solver
Visualizing topics and importances (see CMF.print_topic_terms)

See the docstrings for more details on how to configure CMF.

Use Cases

See samples for working examples. Possible use cases include:

Topic modeling and text classification

Suppose you want to do topic modeling to explore the data, but want to use supervision signals such as toxicity, sentiment, etc.. By using CMF, you can extract topics that are relevant to classifying texts.

Movie rating prediction

Many prediction tasks involve relations between multiple entities. Movie rating prediction is a good example: common entities include users, movies, genres and actors. CMF can be used to model these relations and predict unobserved edges.

License

This project is licensed under the MIT License - see the LICENSE file for details

References

Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, (1), 556–562.

Singh, A. P., & Gordon, G. J. (2008). Relational learning via collective matrix factorization. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 08, 650.

Wang, Y., Yanchunzhangvueduau, E., & Zhou, B. (2017). Semi-supervised collective matrix factorization for topic detection and document clustering.

TODO

Improve performance
Add support for weight matrices on relations
Add support for predicting using obtained components
Full compatibility with sklearn

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.circleci		.circleci
benchmarks		benchmarks
pycmf		pycmf
samples		samples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
setup_dev.py		setup_dev.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyCMF

Why Use CMF?

Usage

Getting Started

Features

Use Cases

Topic modeling and text classification

Movie rating prediction

License

References

TODO

About

Releases

Packages

Contributors 2

Languages

License

smn-ailab/PyCMF

Folders and files

Latest commit

History

Repository files navigation

PyCMF

Why Use CMF?

Usage

Getting Started

Features

Use Cases

Topic modeling and text classification

Movie rating prediction

License

References

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages