word-maker

makeword.py implements an n-gram language model to generate words that (most likely) don't exist.

Words are generated on a character by character basis. The likelihood of a character being generated is equal to the relative frequency with which that character followed the last n-1 generated characters in the training set. The first n-1 characters are generated in one chunk drawing from a separate collection of (n-1)-grams that appear at the beginning of words. By default, n = 4.

The default training set is an English dictionary of ~450,000 terms. There are separate entries for each word's different possible prefixes and suffixes, such as run and running.

USAGE

$python3 makeWord.py FLAG(S) ROOT

Flags and root are optional. Flags and root will modify the functionality of makeWord.py.

FLAGS

-c
- print generation with green and highlights denoting particularly expected or unexpected generations respectively
- see COLOR CODING section below for details
-i
- print the relative frequency and relative 'expectedness' for each character generated
- see COLOR CODING section below for details
- implements -v
-m
- wait for user to hit ENTER to generate next character
- implements -v
-max NUM
- set maximum length of generated word to NUM
-min NUM
- set minimum length of generated word to NUM
-n NUM
- set n-gram length to NUM
- longer n-grams tend to yield more convincing fake words but are more likely to generate real words
-t NUM
- wait NUM seconds to generate each character
- implements -v
-v
- print in-progress generation every time a character is generated

ROOT

User can pass a string of length >=n-1 as the final command line argument to serve as the "root" of the generation. makeWord.py will build off of this root to generate a word.

EXAMPLE EXECUTION

In this example, 'rad' is passed as the (optional) root.

$python3 makeword.py -max 12 -v rad
<Hit ENTER to generate word>

rad
radi
radic
radicr
radicro
radicrol
radicroli
radicrolis
radicrolis

COLOR CODING

If the -c flag is passed, characters may be highlighted green or red to denote 'expected' and 'unexpected' generations respectively. A generated character's expectedness is determined by how the probability of its generation compares to the average probability of all other possible genenerated characters given the last n-1 characters. The brightness of the highlight corresponds to the magnitude of its (un)expectedness.

For example, let n=4 and 'xyz' be the last n-1 characters of an in-progress generation. If, in the training set, 'a' follows those characters 90% of the time and 'b' follows them 10% of the time, the average relative frequency in that state is 50%. In this case, 'a' is 40% more likely than average. Thus, if 'a' were generated, it would be shaded a bright green.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.gitignore		.gitignore
README.md		README.md
dictionary.txt		dictionary.txt
makeWord.py		makeWord.py
medicalTerms.txt		medicalTerms.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

word-maker

USAGE

FLAGS

ROOT

EXAMPLE EXECUTION

COLOR CODING

About

Releases

Packages

Languages

luke-de-vos/word-maker

Folders and files

Latest commit

History

Repository files navigation

word-maker

USAGE

FLAGS

ROOT

EXAMPLE EXECUTION

COLOR CODING

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages