Skip to content
Simon Popelier edited this page Jun 25, 2023 · 19 revisions

Finding Mnemo Wiki

Finding Mnemo is a project dedicated to helping you learn new languages using mnemonics generated by AI models.

It relies on different methods to help you remember tricky words and ideas.

Currently Finding Mnemo is in its prototype version and focuses on providing mnemonics for learning Mandarin words coming from English.


User Guide

This section will help you with using the Finding Mnemo application.

Accessing the App

Streamlit Online Application

You can find a running version of the application on Streamlit Community Cloud.

This is a simple demo that you will be able to test online.

Local run

You can also set up your own project version locally using dedicated containers.

Deployment

The easiest way to deploy every service is through containers. This app requires both a Database (Redis), a backend (FastAPI) and a frontend (Streamlit). In order to deploy all of them, we would use the docker-compose.yaml file using the command:

docker compose up 

That should pop three docker containers, which you can check on the UI or using docker ps.

container-screen Docker UI after using compose-up command

Functionalities

Currently, the app provides a single functionality for creating mnemonics linking Mandarin words to similar-sounding English words. This link takes the form of a silly phrase mentioning both the translation of the Mandarin word and the close-sounding English word.

streamlit Streamlit UI used for finding a mnemonic for the word: 苹果 (apple)


Technical Documentation

Three components are currently in development for this prototype:

  • Pairing component: Finding an English word sounding like a given Mandarin word.
  • Keyphrase component: Generates a sentence mixing both paired words: translation of the Mandarin word and sound alike word.
  • Chaining component: Finds a chain of words connecting paired words.

Pairing

This section is dedicated to finding words from a particular language (English) that sounds like words from another language (Mandarin). This can be linked to the concept of wordlikeliness: "the extent to which a sound sequence is typical of words in a language".

To estimate a word's resemblance to another, we could use levenshtein's distance on the IPA spelling of words.

However, this distance is very computation heavy and takes prohibitive time to run over a large number of words.

Instead, we will learn a proxy model that produces an embedding space where distances (tests have been performed on cosine distance and Euclidean distance) are equivalent to the Levenshtein distance.

This way, we can rely on neural search techniques to only have a single word to process at query time and to match it with already an already processed index of common words, very effectively.

Pairing model

This model relies on a transformer architecture applied on IPA characters.

We use it to encode words to the embedding space discussed above and then use euclidean distance to evaluate the phonetic distance between them.

parallel_plot_pairing Parallel plot of model's performance in contrast with some hyperparameters

Dataset

It is trained with Triplet Loss, on a dataset generated using the original levenshtein's distance (Weak supervision).

Triplet loss works using a margin value, separating the anchor word and positive word from the negative word. We intend to use the knowledge of the exact distance to generate words that fit exactly or as close as possible from this margin (i.e. most relevant pairs of words).

Search Engine

The search engine module allows matching an embedding with a list of already indexed embeddings.

It is a pretty simple representation learning application of neural ranking.

The module is developed using the DocumentArray framework from JinaAI, a specialist in neural searching.

Key-Chain Linking

In order to find a connection between two words, the first approach is to link them via a knowledge graph: find a path in that graph that links both of these words.

Wikipedia roaming

We use Wikipedia as our knowledge graph: each page is a node, and links to other pages represent edges. (Wiktionary has been tried but seems less complete).

path_finding Finding path from "Finding Nemo" to "China" through Wikipedia knowledge graph

The fullest approach using Wikipedia as our graph would be to find one of the shortest paths from one word to another.

Shortest path model

In order to reduce the number of calls we make to Wikipedia's API, we are looking to find the best pages to explore. That can be achieved using A* algorithm with some heuristic for distance estimation.

A machine learning model can be used to predict how far a page is from another and be used inside A* as the heuristic.

A Graph Neural Network, as well as a standard ML approach, have been tried.

However, API calls still make the process very slow.

Link prediction

A less effective approach, but a faster one is simply to start with a finite graph (e.g. a graph for all English words in our vocabulary) and simply find the word in the graph our input word is the closest from, and then compute the path in that graph to our final word instead of roaming Wikipedia to find the optimal path.

link_prediction Finding path from "Elephant" to "China" through restricted Wikipedia knowledge graph

Link prediction model

Key-Phrase Generation

Another approach to generating mnemonics is to embed the keywords in the same sentence: a key-phrase.

Text generation is a classic task, hyper-popularized thanks to the advent of LLMs. In our case, not only do we want to generate a sentence that makes sense, but we also need that sentence to contain certain words: our keywords.

Two approaches have been tried in that direction.

Constrained key-phrase generation

The approach that yielded the best results is called guided text generation.

In that approach, we use a regular text generation model, and add a constraint during inference, forcing words to appear in that predicted sentence.

The reason why it yields the best results is two-fold:

  1. We can use any Language model we want, including the most performant ones.
  2. The keywords are forced to appear by design, at 100%, although their placement might be unfortunate.

In order to keep the application light, we are using flan t5 (small) as our language model.

Although very far from ideal, this is the current method used in the application. Occasionally making sense, often giving absurd (but sometimes effective) results.

Supervised key-phrase generation

The alternative approach to generating a text that contains keywords is to incentivize the model to do that during training. For example, giving examples that use keywords as prompts, and a sentence containing these keywords as targets.

The key-to-text model from HuggingFace does exactly this.

However, this is a soft constraint and the model does not necessarily respect it. It only has higher chances than a raw language model.

Moreover, the quality of the language model can be downgraded during the fine-tuning.

These are the two reasons why the first approach is preferred in this application

Application

flowchart

Streamlit is used as a simple UI tool to present the application. A live demo can be found on the cloud.

FastAPI is used as the backend library. It is used in the Docker version of the application and could be used with a more developed UI framework.

Redis is used only in the Docker version of the application to store the embeddings of candidate words. It could be useful if the pool of candidate was to grow, but can be replaced by an in-memory data storage like on the demo version.


References

  • Richardson, T. W., Heisig, J. W. (2008). Remembering Simplified Hanzi 1: How Not to Forget the Meaning and Writing of Chinese Characters. United States: University of Hawaii Press.
  • Keytotext: https://github.com/gagan3012/keytotext