A tool for generating a vocabulary quiz.
Unlike existing generators, Word Quiz Generator produces "Write the definition" questions by extracting a sentence that contains a specific word/phrase from given source texts.
This package only provides a CLI tool and APIs. For the Web Application version of Word Quiz Generator, please see word-quiz-generator-webapp.
If you will use Word Quiz Generator as an CLI tool, install globally:
$ npm install -g word-quiz-generator
If you will use Word Quiz Generator from your program (i.e. using the APIs), you may want to install without the -g
flag but with --save
flag:
$ npm install --save word-quiz-generator
$ word-quiz-generator <command> [<args>]
You can see details of each commands by executing a command with --help
, e.g. word-quiz-generator generate --help
.
word-quiz-generator generate --material=<path> --sources=<paths> --sections --size [--instruction] [--lang] [--sentenceSeparator=<RegExp>] [--clauseRegExp=<RegExp>] [--wordRegExp=<RegExp>] [--wordBoundaryRegExp=<RegExp>] [--abbrRegExp=<RegExp>]
generate
command will generate a quiz and put it to stdout using the given material and sources.
-
--material
A path string to a material (phrase list) file. A material file should be CSV-formatted and has section, phrase and answer columns. See Material for more details.
-
--sources
Comma-separated path strings to text files that will be used as a question sentence. A source file should have reference information as the first line and the rest of the file will be the body text. See Source for more details.
-
--sections
A string representing the range of the material to be on the quiz. Each numbers represents the start and end section number respectively.
-
--size
The number of questions in a quiz.
-
--instruction
An instruction message that will be located at the top of the quiz.
The following options determines how to extract a word/phrase or sentence from a text. For English quiz, these are automatically set and usually don't need to override them.
-
--lang
IETF langage tag in which the material are written. This determines how to extract a word/phrase or sentence from a text. If you need more precise control over the extraction algorithm, please use
--sentenceSeparator
,--clauseRegExp
,--wordRegExp
,--wordBoundaryRegExp
, and/or--abbrRegExp
to override.Default:
en
(English) -
--sentenceSeparator
Regular expression representing a sentence separator.
-
--clauseRegExp
Regular expression representing a clause.
-
--wordRegExp
Regular expression representing a word.
-
--wordBoundaryRegExp
Regular expression representing a word boundary.
-
--abbrRegExp
Regular expression representing an abbreviation mark.
$ word-quiz-generator generate \
--material=/path/to/phrase-list.csv \
--sources=/path/to/source-texts \
--sections=1-10 \
--size=10
word-quiz-generator coverage --material=<path> --sources=<paths> [--lang] [--show-uncovered] [--sentenceSeparator=<RegExp>] [--clauseRegExp=<RegExp>] [--wordRegExp=<RegExp>] [--wordBoundaryRegExp=<RegExp>] [--abbrRegExp=<RegExp>]
coverage
command will measure the phrase coverage against the given sources. It's usuful when you want to know whether the amount of sources is sufficient.
-
--show-uncovered
Showing the uncovered words/phrases.
-
Other options are the same as those of
generate
command.
word-quiz-generator install [--silent] <package name> [... <package name>]
install
command will install the dependent binaries under ${package's root}/vendor
that are needed for preprocessing and lemmatizing.
-
CoreNLP
A suite of core NLP (natural language processing) tools by Stanford University.
-
MeCab
Yet Another Part-of-Speech and Morphological Analyzer by Kyoto University and NTT.
The packages you need to install depend on what languages do you want to test in your quizzes.
Language | Language Tag (--lang ) |
Required Package(s) |
---|---|---|
English | en | CoreNLP |
Old Japanese (古文, kobun) | ojp | MeCab |
word-quiz-generator make --src=<path> --lang=<lang>
word-quiz-generator make --src=<path> [--preprocessor=<path>] [--lemmatizer=<path>]
make
command will generate preprocessed and lemmatized texts from the given sources.
-
--src
Comma-separated path strings to be processed.
-
--lang
IETF langage tag in which source texts are written. This tells the generator which built-in preprocesser and lemmatizer should be used. If you want to use your custom ones, please use
--preprocessor
and--lemmatizer
options. -
--preprocessor
Path to a custom preprocessor. A custom preprocessor must export an async function as default which returns the preprocessed text.
-
--lemmatizer
Path to a custom lemmatizer. A custom lemmatizer must export an async function as default which returns the lemmatized text.
Create a new instance that can generate a quiz.
The available options are material
, sources
, lang
, sentenceSeparator
, clauseRegExp
, wordRegExp
, wordBoundaryRegExp
and abbrRegExp
, all of which are same as the corresponding generate
options.
Initialize a instance. The reason why the initialization is not conducted in constructor()
is in this method we load and parse the files of material and sources, which is an asynchronous process.
You must call this method before doing your jobs.
Generate a quiz and return an array of questions.
A string representing which sections of the material should be on quiz. For example, 5-7
means the quiz target is Section 5, 6 and 7.
A number representing the number of questions in the quiz. If you specify the larger number than the number of phrases in the material, the generator creates as many questions as possible (i.e. the number of phrases in the material).
Generate a question using the given phrase. If no sentences that contain the given phrase are found, this method returns null
.
phrase
should be an object that has the following properties.
-
phrase
A phrase to be on quiz.
-
answer
An answer (usually the meaning of the phrase).
Material is a collection of words/phrases.
CSV with the following columns.
<section number>,<phrase>,<answer>
A section number represents which unit does a word belong to in the material.
A phrase represents an expression to be on quiz. There are some special characters:
-
|
Indicating these different expressions should be treated as an single item, e.g.
color|colour
. -
...
Indicating some words are to be inserted, e.g.
think ... of
. This expression can be changed through theabbrRegExp
option ingenerator
/coverage
command andQuizGenerator
class.
Be careful that phrase
will be searched against a lemmatized text.
This means you may have to write a seemingly odd phrase instead of a natural expression if you prepare a lemmatized text.
For example, when you are using the built-in English preprocessor and lemmatizer, you have to write:
be go to
instead ofbe going to
- All words should be the dictionary form.
one's pleasure
instead ofMy pleasure
- All possessive pronoun should be
one's
.
- All possessive pronoun should be
by oneself
instead ofby myself
- All personal pronouns should be
one
oroneself
- All personal pronouns should be
one'be a piece of cake
instead ofIt's a piece of cake
An answer represents the meaning of the phrase.
Source is a text with reference information. Word Quiz Generator will extract a sentence from the body text of a source and create a question.
A source text file should have a reference as the first line of a file and text as the rest part of a file.
e.g. The content of a source file of De finibus bonorum et malorum written by Marcus Tullius Cicero will be
De finibus bonorum et malorum, Marcus Tullius Cicero
Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo.
Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos,
...
Word Quiz Generator even works against plain text files, but preprocessing and lemmatizing an original text improve the quality of a generated quiz.
The generator treats a file whose name ends .preprocessed
at the same directory as the preprocessed version text, and whose name ends .lemmatized
as the lemmatized version text.
~/sources/foo-bar.txt # original
~/sources/foo-bar.txt.preprocessed # preprocessed
~/sources/foo-bar.txt.lemmatized # lemmatized
If preprocessed or lemmatized text is not found, then the generator uses the original text as an input.
Preprocessed texts will be used in lemmatization and displaying a question. Preprocessing can enhance the accuracy on lemmatization and/or the readability of a question sentence. As for the built-in English preprocessor, for example, it includes removing unnecessary line breaks and soft hyphens.
Lemmatized texts will be used in searching sentences. Lemmatization can improve the precision and recall of searching a sentence with a specific word/phrase.
For instance, imagine the generator is creating a question using 'go through'. If no processes are applied to the original text, sentences like
Comets may go through a transition phase as they come close to extinction.
(from Wikipedia, "Extinct comet")
can be picked up by the generator, while
If another sailor went through the bag, the odds were high the thief would tie the bag back
(from Wikipedia, "Thief knot")
will not.
To make such sentences recognizable to the generator, the lemmatisation process is needed. It will replace each words with its lemma. Therefore the above sentence becomes
if another sailor go through the bag, the odd be high the thief will tie the bag back
and the generator is now able to know the sentence contains the phrase "go through".
In word-quiz-generator, you can preprocess and lemmatize your text from CLI.
- Install binary dependencies, which parse a text and convert each words into their lemma. 👉 See install.
word-quiz-generator install [package name]
- Run
make
command. 👉 See make.
word-quiz-generator make --src=<path to sources> --lang=<language tag>
npm test
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D
MIT License (http://nodaguti.mit-license.org/)
- Stanford CoreNLP.
- Used for preprocessor and lemmatizer of English.
- GNU Collaborative International Dictionary of English, GNU Dico, GNU General Public License.
- Used for the test data (materials) of English.
- Wikipedia, Creative Commons Attribution-ShareAlike 3.0 Unported License.
- Used for the test data (sources) of English.
- UniDic for Early Middle Japanese (中古和文UniDic), Toshinobu Ogiso, UniDic License.
- Used for the preprocessor and lemmatizer of Old Japanese (ojp).
- Izayoi Nikki (十六夜日記), degitized by Taiju, retrieved from 日本古典文学テキスト, Public Domain.
- Used for the test data (sources) of Old Japanese.
- Wikisource (Japanese), Creative Commons Attribution-ShareAlike 3.0 Unported License.
- Used for the test data (sources) of Old Japanese.