Latent Dirichlet Allocation

Implement variational inference algorithm for latent dirichlet allocation. Train model on a small subset of wikipedia. Evaluate and visualize with pyLDAvis

To reproduce check the following scripts:

scripts/setup_anaconda_env.bash to build suitable anaconda-environment.
scripts/00_setup.bash to download the wikipedia dataset.
scripts/extractSmallSubset.bash to extract a subset of the dataset.
scripts/01_preprocess.bash to process xml files and save the dictionary and wordcounts for each document.
scripts/02_training.bash to estimate the distribution parameters and save the
to visualize run the jupyter-notebook with the same name and point it to the location of your trained model (by setting the path in the second cell). A Small model is in

There are three relevant Python classes in the package lda.

Dataset in lda/dataset.py for all corpus preprocessing operations as well as loading and saving datasets in the native Python serialization format pickle.
LDA in lda/inference.py to perform the inference algorithm on a dataset
GenMod in lda/generativeModel.py to sample from a LDA model given the hyperparameters

Name	Name	Last commit message	Last commit date
Latest commit chrlen add example for saving and loading datasets Feb 23, 2019 e446785 · Feb 23, 2019 History 120 Commits
lda	lda	fix termLock size	Feb 21, 2019
log	log	add log for parallel training	Feb 22, 2019
plot	plot	clean	Feb 23, 2019
scripts	scripts	add script to build latex and tar archive	Feb 23, 2019
tests	tests	initial commit	Nov 30, 2018
tex	tex	continue writing	Feb 23, 2019
.gitignore	.gitignore	add pyc files	Feb 16, 2019
README.md	README.md	Write minimal instructions for reproduction	Feb 19, 2019
example.py	example.py	add example for saving and loading datasets	Feb 23, 2019
generativeModel.ipynb	generativeModel.ipynb	clean and rename	Feb 19, 2019
gensimLDAtrain.py	gensimLDAtrain.py	train small gensim model	Jan 29, 2019
gensimPyLDAvis.py	gensimPyLDAvis.py	add gensimPyLDAvis.py	Feb 16, 2019
loadAndPyLDAvis.py	loadAndPyLDAvis.py	make path an argument	Feb 19, 2019
preprocess.py	preprocess.py	finish preprocessing script	Feb 16, 2019
requirements.txt	requirements.txt	formatting	Feb 16, 2019
setup.py	setup.py	initial commit	Nov 30, 2018
training.py	training.py	add finish training.py	Feb 16, 2019
trainingParallel.py	trainingParallel.py	add trainingParallel.py	Feb 20, 2019
visualize.ipynb	visualize.ipynb	continue writing	Feb 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Dirichlet Allocation

About

Languages

chrlen/lda

Folders and files

Latest commit

History

Repository files navigation

Latent Dirichlet Allocation

About

Topics

Resources

Stars

Watchers

Forks

Languages