Skip to content

efskap/kindlewick

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kindlewick

Screenshot

This is a Go program to fetch Wiktionary page content from their API, (optionally) intersect it with a frequency wordlist (as the database is probably too big otherwise), and then produce an HTML file that, together with an .opf file, can be converted to mobi and used on your Kindle for in-book lookups.

Note: My target lang is Finnish, so that’s what I wrote this program in mind with. Hopefully it’ll work out of the box for your TL too, but there’s always the possibility that it does something wonky in the inflection table. Fear not though, goquery is easy to work with!

Instructions

  1. Download the necessities

    1. Install Go. Probably 1.12.

    2. Download a frequency wordlist for your language from here if possible.

      Otherwise the file might be too big for kindlegen to handle, as it’s a 32-bit program. Finnish, with its 98184 lemmata, proved too big to process without a freq list, but your obscure language might be fine.
      If you can’t find one, just omit the -freqlist flag below.

    3. Download kindlegen for your platform You’ll use this to convert the .opf + .html files into .mobi.

  2. Edit the metadata in dict.opf.

    1. Don’t forget to modify <DictionaryInLanguage>!
      Set it to the ISO 639-1 code from here.

    2. You can replace cover.png too, but it matter much as the dictionary won’t show up as a book by default.

  3. To generate dict.html, which dict.opf references, run this, with the name of the frequency list you downloaded instead of fi.txt:

    go run kindlewick.go -freqlist fi.txt

    If it’s still too big, you can just take the first 50k lines or whatever from the file (in bash/zsh/etc) like so:

    go run kindlewick.go -freqlist <(head -n 50000 fi.txt)
  4. Finally, generate the .mobi file and put it on your Kindle!

    kindlegen dict.opf -verbose -c2 -o my_dict.mobi

Q&A

How are inflections acquired?

Basically it just takes every span inside a table cell, and if it consists of multiple words, takes the last one (olen odottanutodottanut, since you can only look up a single word at a time on Kindle), and filters out duplicates.

Why not consult the frequency list before downloading every single page?

Because frequency lists usually have inflected forms of words, and if you only see olen in the list you won’t know you have to download the lemma form olla. Ergo, download everything and keep entries where at least some form of the word shows up in the frequency list.

About

collects wiktionary defintions into the kindle format for in-book lookups

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages