English Dictionary

This is a minimally tested and incomplete parser of the Webster Unabriged English Dictionary from the modified GCIDE XML that categorizes content to make it easy to find and parse. I was doing a lot of research on finding a machine readable English dictionary for a project where I didn't want to rely on a third party API (e.g. Wordnik).

Generate Simple JSON

From the project directory, run the following:

ruby parse.rb

This will generate a JSON file for each GCIDE XML file. Each object key is a unique word and the value being an object containing the definitions (array of objects - definition, part of speech, field, and sequence). The files (excluding obsolete content) will contain ~99k unique words and ~160k definitions.

Resources

GCIDE

After reviewing all resources went first with parsing this GCIDE XML. The next best solution seems to be Wikitionary TSV.

http://rali.iro.umontreal.ca/GCIDE/ (the ZIP download is further down the page)

Wikitionary TSV

http://aautar.digital-radiation.com/wiktionary-db/wiktionary.E20121127.tsv.zip
http://semisignal.com/?p=5666 (TSV file linked to above and sample code)
https://github.com/boyers/asler/tree/master/scratch

Webster's Unabridged Dictionary (1913 - public domain)

Moby Word Lists

https://github.com/drichert/moby (Ruby parser for hyphenation, parts-of-speech, and thesaurus)

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
.github		.github
dist/gcide		dist/gcide
lib		lib
sources/gcide		sources/gcide
test		test
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
dictionary-test.json		dictionary-test.json
dictionary.json		dictionary.json
parse.rb		parse.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English Dictionary

Generate Simple JSON

Resources

GCIDE

Wikitionary TSV

Webster's Unabridged Dictionary (1913 - public domain)

Moby Word Lists

About

Releases

Packages

Contributors 3

Languages

javierjulio/dictionary

Folders and files

Latest commit

History

Repository files navigation

English Dictionary

Generate Simple JSON

Resources

GCIDE

Wikitionary TSV

Webster's Unabridged Dictionary (1913 - public domain)

Moby Word Lists

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages