andrewtavis · andrewtavis · May 18, 2021 · Apr 29, 2021 · Apr 29, 2021 · Apr 29, 2021
diff --git a/README.md b/README.md
@@ -16,9 +16,9 @@
 [![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![colab](https://img.shields.io/badge/%20-Open%20in%20Colab-097ABB.svg?logo=google-colab&color=097ABB&labelColor=525252)](https://colab.research.google.com/github/andrewtavis/wikirec)
 
-### NLP recommendation engine based on Wikipedia data
+### Recommendation engine framework based on Wikipedia data
 
-**wikirec** is a framework that allows users to parse Wikipedia in any language for entries of a given type and then seamlessly generate recommendations based on unsupervised natural language processing. Along with NLP based similarity recommendations, user ratings can also be leveraged to weigh inputs and indicate preferences. The goal is for wikirec to both refine and deploy models that provide accurate content recommendations based solely on open-source data.
+**wikirec** is a framework that allows users to parse Wikipedia in any language for entries of a given type and then seamlessly generate recommendations for the given content. Recommendations are based on unsupervised natural language processing over article texts, with ratings being leveraged to weigh inputs and indicate preferences. The goal is for wikirec to both refine and deploy models that provide accurate content recommendations with only open-source data.
 
 See the [documentation](https://wikirec.readthedocs.io/en/latest/) for a full outline of the package including models and data preparation.
 
@@ -212,24 +212,26 @@ tfidf_embeddings = model.gen_embeddings(
 <p>
 </details>
 
-<details><summary><strong>Wikilink NN (WIP)</strong></summary>
+<details><summary><strong>WikilinkNN</strong></summary>
 <p>
 
-WIP - see [the issue](https://github.com/andrewtavis/wikirec/issues/36)
-
-Based on this [Towards Data Science article](https://towardsdatascience.com/building-a-recommendation-system-using-neural-network-embeddings-1ef92e5c80c9), the wikilink neural network method makes the assumption that Wikipedia articles that are linked to the same articles will themselves be similar.
-
-`Pseudocode` follows:
+Based on this [Towards Data Science article](https://towardsdatascience.com/building-a-recommendation-system-using-neural-network-embeddings-1ef92e5c80c9), the wikilink neural network method makes the assumption that content will be similar if they are linked to the same Wikipedia articles. A corpus of internal wikilinks per article is passed, and embeddings based on these internal references are then derived.
 
 ```python
 from wikirec import model
 
-wikilink_nn_embeddings = model.gen_embeddings(
-        method="wikilink_nn",
-        corpus=wikilinks,
+wikilink_embeddings = model.gen_embeddings(
+        method="WikilinkNN",
+        path_to_json="./enwiki_books.ndjson",  # json used instead of a corpus
+        path_to_embedding_model="books_embedding_model.h5",
+        embedding_size=75,
+        epochs=20,
+        verbose=True,
 )
 ```
 
+The [examples](https://github.com/andrewtavis/wikirec/tree/main/examples) directory has a copy of `books_embedding_model.h5` for testing purposes.
+
 <p>
 </details>
 
@@ -258,20 +260,20 @@ recs = model.recommend(
 
 # Comparative Results [`↩`](#contents) <a id="comparative-results"></a>
 
-TFIDF generally outperformed all other methods in terms of providing what the user would expect, with the results being all the more striking considering its runtime is by far the shortest. The other strong performing model is BERT, as it does the best job of providing novel but sensible recommendations. LDA with the second shortest runtime provides novel recommendations along with what is expected, but recommends things that seem out of place more often than BERT. Doc2vec performs very poorly in that most results are nonsense, and it further takes the longest to train.
+- TFIDF generally outperformed all other NLP methods in terms of providing what the user would expect, with the results being all the more striking considering its runtime is by far the shortest.
+- The other strong performing NLP model is BERT, as it does the best job of providing novel but sensible recommendations.
+- WikilinkNN also provides very sensible results, giving wikirec effective modeling options using different kinds of inputs.
+- LDA with the second shortest runtime provides novel recommendations along with what is expected, but recommends things that seem out of place more often than BERT.
+- Doc2vec performs very poorly in that most results are nonsense, and it further takes the longest to train.
 
 See [examples/rec_books](https://github.com/andrewtavis/wikirec/blob/main/examples/rec_books.ipynb) and [examples/rec_movies](https://github.com/andrewtavis/wikirec/blob/main/examples/rec_movies.ipynb) for detailed demonstrations with model comparisons, as well as [examples/rec_ratings](https://github.com/andrewtavis/wikirec/blob/main/examples/rec_ratings.ipynb) for how to leverage user ratings. These notebooks can also be opened in [Google Colab](https://colab.research.google.com/github/andrewtavis/wikirec) for direct experimentation.
 
-Samples of TFIDF and BERT book recommendations using cosine similarity follow:
+Sample recommendations for single and multiple inputs are found in the following dropdowns:
 
-<details><summary><strong>Baseline NLP Models</strong></summary>
+<details><summary><strong>TFIDF</strong></summary>
 <p>
 
-Recommendations for single and multiple inputs follow:
-
 ```_output
--- TFIDF --
-
 Harry Potter and the Philosopher's Stone recommendations:
 [['Harry Potter and the Chamber of Secrets', 0.5974588223913879],
  ['Harry Potter and the Deathly Hallows', 0.5803045645372675],
@@ -307,10 +309,16 @@ Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
  ['Mr. Bliss', 0.3219122094772891],
  ['Harry Potter and the Order of the Phoenix', 0.3160426316664049],
  ['The Magical Worlds of Harry Potter', 0.30770960167033506]]
+```
 
- -- BERT --
+<p>
+</details>
 
- Harry Potter and the Philosopher's Stone recommendations:
+<details><summary><strong>BERT</strong></summary>
+<p>
+
+```_output
+Harry Potter and the Philosopher's Stone recommendations:
 [['Harry Potter and the Prisoner of Azkaban', 0.8625375],
  ['Harry Potter and the Chamber of Secrets', 0.8557441],
  ['Harry Potter and the Half-Blood Prince', 0.8430752],
@@ -322,7 +330,7 @@ Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
  ['The Weirdstone of Brisingamen', 0.8035261],
  ['Harry Potter and the Cursed Child', 0.79987496]]
 
- The Hobbit recommendations:
+The Hobbit recommendations:
 [['The Lord of the Rings', 0.8724792],
  ['Beast', 0.8283818],
  ['The Children of Húrin', 0.8261733],
@@ -334,7 +342,7 @@ Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
  ['The Amazing Maurice and His Educated Rodents', 0.8089799],
  ['Dark Lord of Derkholm', 0.8068354]]
 
- Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
+Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
 [['The Weirdstone of Brisingamen', 0.79162943],
  ['Harry Potter and the Prisoner of Azkaban', 0.7681779],
  ['A Wizard of Earthsea', 0.7566709],
@@ -350,7 +358,52 @@ Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
 <p>
 </details>
 
-<details><summary><strong>Weighted NLP Approach</strong></summary>
+<details><summary><strong>WikilinkNN</strong></summary>
+<p>
+
+```_output
+Harry Potter and the Philosopher's Stone recommendations:
+[['Harry Potter and the Chamber of Secrets', 0.9697026],
+ ['Harry Potter and the Goblet of Fire', 0.969065],
+ ['Harry Potter and the Deathly Hallows', 0.9685888],
+ ['Harry Potter and the Half-Blood Prince', 0.9635748],
+ ['Harry Potter and the Prisoner of Azkaban', 0.9569129],
+ ['Harry Potter and the Order of the Phoenix', 0.94091964],
+ ['Harry Potter and the Cursed Child', 0.9358928],
+ ['My Immortal (fan fiction)', 0.91195196],
+ ['Eragon', 0.89236057],
+ ['Quidditch Through the Ages', 0.8891448]]
+
+The Hobbit recommendations:
+[['The Lord of the Rings', 0.94245297],
+ ['The Silmarillion', 0.9160445],
+ ['Beren and Lúthien', 0.90604335],
+ ['The Fall of Gondolin', 0.9044683],
+ ['The Children of Húrin', 0.895282],
+ ['The Book of Lost Tales', 0.89020956],
+ ['The Road to Middle-Earth', 0.88268256],
+ ["The Magician's Nephew", 0.8816683],
+ ['The History of The Hobbit', 0.87789804],
+ ['Farmer Giles of Ham', 0.87786204]]
+
+Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
+[['The Lord of the Rings', 0.8367433249950409],
+ ['Harry Potter and the Deathly Hallows', 0.8294640183448792],
+ ['The Children of Húrin', 0.8240831792354584],
+ ['Harry Potter and the Prisoner of Azkaban', 0.8158660233020782],
+ ['Harry Potter and the Goblet of Fire', 0.8150344789028168],
+ ['Eragon', 0.8118217587471008],
+ ['Harry Potter and the Chamber of Secrets', 0.8101150393486023],
+ ['Fantastic Beasts and Where to Find Them', 0.8092647194862366],
+ ['Watership Down', 0.8012698292732239],
+ ['Harry Potter and the Half-Blood Prince', 0.7979166805744171]]
+
+```
+
+<p>
+</details>
+
+<details><summary><strong>Weighted Model Approach</strong></summary>
 <p>
 
 Better results can be achieved by combining TFIDF and BERT:
@@ -364,7 +417,7 @@ bert_tfidf_sim_matrix = tfidf_weight * tfidf_sim_matrix + bert_weight * bert_sim
 ```_output
 -- Weighted BERT and TFIDF --
 
- Harry Potter and the Philosopher's Stone recommendations:
+Harry Potter and the Philosopher's Stone recommendations:
 [['Harry Potter and the Chamber of Secrets', 0.7653442323224594],
  ['Harry Potter and the Half-Blood Prince', 0.7465576592959889],
  ['Harry Potter and the Goblet of Fire', 0.7381149146065132],
@@ -376,7 +429,7 @@ bert_tfidf_sim_matrix = tfidf_weight * tfidf_sim_matrix + bert_weight * bert_sim
  ['The Ickabog', 0.6218310147923186],
  ['Fantastic Beasts and Where to Find Them', 0.6161251907593163]]
 
- The Hobbit recommendations:
+The Hobbit recommendations:
 [['The History of The Hobbit', 0.78046806361336],
  ['The Lord of the Rings', 0.764041360399863],
  ['The Annotated Hobbit', 0.7444487700381719],
@@ -388,7 +441,7 @@ bert_tfidf_sim_matrix = tfidf_weight * tfidf_sim_matrix + bert_weight * bert_sim
  ['J. R. R. Tolkien: A Biography', 0.6391232063030203],
  ['Tolkien: Maker of Middle-earth', 0.6309609890944725]]
 
- Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
+Harry Potter and the Philosopher's Stone and The Hobbit recommendations:
 [['Harry Potter and the Half-Blood Prince', 0.6018217616032179],
  ['Harry Potter and the Prisoner of Azkaban', 0.5989788027468591],
  ['The Magical Worlds of Harry Potter', 0.5909785871728664],
@@ -401,6 +454,16 @@ bert_tfidf_sim_matrix = tfidf_weight * tfidf_sim_matrix + bert_weight * bert_sim
  ['Harry Potter and the Goblet of Fire', 0.5653645423523244]]
 ```
 
+The WikilinkNN model can be combined with other models by subsetting the similarity matrix for titles derived in the cleaning process:
+
+```python
+wikilink_sims_copy = wikilink_sims.copy()
+not_selected_idxs = [i for i in range(len(titles)) if i not in selected_idxs]
+
+wikilink_sims_copy = np.delete(wikilink_sims_copy, not_selected_idxs, axis=0)
+wikilink_sims_copy = np.delete(wikilink_sims_copy, not_selected_idxs, axis=1)
+```
+
 <p>
 </details>
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -24,7 +24,7 @@
 author = "wikirec developers"
 
 # The full version, including alpha/beta/rc tags
-release = "0.2.1"
+release = "0.2.2"
 
 
 # -- General configuration ---------------------------------------------------
@@ -170,7 +170,7 @@
         "wikirec Documentation",
         author,
         "wikirec",
-        "NLP recommendation engine based on Wikipedia data",
+        "Recommendation engine framework based on Wikipedia data",
         "Miscellaneous",
     )
 ]
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -40,7 +40,7 @@
 .. |colab| image:: https://img.shields.io/badge/%20-Open%20in%20Colab-097ABB.svg?logo=google-colab&color=097ABB&labelColor=525252
     :target: https://colab.research.google.com/github/andrewtavis/wikirec
 
-NLP recommendation engine based on Wikipedia data
+Recommendation engine framework based on Wikipedia data
 
 Installation
 ------------

diff --git a/examples/books_embedding_model.h5 b/examples/books_embedding_model.h5