Kindlewick

This is a Go program to fetch Wiktionary page content from their API, (optionally) intersect it with a frequency wordlist (as the database is probably too big otherwise), and then produce an HTML file that, together with an .opf file, can be converted to mobi and used on your Kindle for in-book lookups.

Note: My target lang is Finnish, so that’s what I wrote this program in mind with. Hopefully it’ll work out of the box for your TL too, but there’s always the possibility that it does something wonky in the inflection table. Fear not though, goquery is easy to work with!

Instructions

Download the necessities
1. Install Go. Probably 1.12.
2. Download a frequency wordlist for your language from here if possible.
  
  Otherwise the file might be too big for kindlegen to handle, as it’s a 32-bit program. Finnish, with its 98184 lemmata, proved too big to process without a freq list, but your obscure language might be fine.
  If you can’t find one, just omit the -freqlist flag below.
3. Download kindlegen for your platform You’ll use this to convert the .opf + .html files into .mobi.
Edit the metadata in dict.opf.
1. Don’t forget to modify <DictionaryInLanguage>!
  Set it to the ISO 639-1 code from here.
2. You can replace cover.png too, but it matter much as the dictionary won’t show up as a book by default.
To generate dict.html, which dict.opf references, run this, with the name of the frequency list you downloaded instead of fi.txt:
```
go run kindlewick.go -freqlist fi.txt
```
If it’s still too big, you can just take the first 50k lines or whatever from the file (in bash/zsh/etc) like so:
```
go run kindlewick.go -freqlist <(head -n 50000 fi.txt)
```
Finally, generate the .mobi file and put it on your Kindle!
```
kindlegen dict.opf -verbose -c2 -o my_dict.mobi
```

Q&A

How are inflections acquired?: Basically it just takes every span inside a table cell, and if it consists of multiple words, takes the last one (olen odottanut → odottanut, since you can only look up a single word at a time on Kindle), and filters out duplicates.
Why not consult the frequency list before downloading every single page?: Because frequency lists usually have inflected forms of words, and if you only see olen in the list you won’t know you have to download the lemma form olla. Ergo, download everything and keep entries where at least some form of the word shows up in the frequency list.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.adoc		README.adoc
cover.png		cover.png
dict.gohtml		dict.gohtml
dict.opf		dict.opf
fi.txt		fi.txt
kindlewick.go		kindlewick.go
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.adoc

README.adoc

cover.png

cover.png

dict.gohtml

dict.gohtml

dict.opf

dict.opf

fi.txt

fi.txt

kindlewick.go

kindlewick.go

screenshot.png

screenshot.png

Repository files navigation

Kindlewick

Instructions

Q&A

About

Releases

Packages

Languages

efskap/kindlewick

Folders and files

Latest commit

History

Repository files navigation

Kindlewick

Instructions

Q&A

About

Topics

Resources

Stars

Watchers

Forks

Languages