You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement variational inference algorithm for latent dirichlet allocation.
2
4
Train model on a small subset of wikipedia.
5
+
Evaluate and visualize with pyLDAvis
6
+
7
+
To reproduce check the following scripts:
8
+
- scripts/setup_anaconda_env.bash to build suitable anaconda-environment.
9
+
- scripts/00_setup.bash to download the wikipedia dataset.
10
+
- scripts/extractSmallSubset.bash to extract a subset of the dataset.
11
+
- scripts/01_preprocess.bash to process xml files and save the dictionary and wordcounts for each document.
12
+
- scripts/02_training.bash to estimate the distribution parameters and save the
13
+
- to visualize run the jupyter-notebook with the same name and point it to the location of your trained model (by setting the path in the second cell). A Small model is in
3
14
4
-
Questions:
5
-
- is english ok?
6
-
- gsm.parsing.preprocessing.preprocess_string does all at once, ok?
15
+
There are three relevant Python classes in the package **lda**.
16
+
- Dataset in lda/dataset.py for all corpus preprocessing operations as well as loading and saving datasets in the native Python serialization format pickle.
17
+
- LDA in lda/inference.py to perform the inference algorithm on a dataset
18
+
- GenMod in lda/generativeModel.py to sample from a LDA model given the hyperparameters
0 commit comments