Skip to content

Commit

Permalink
UniqueTokens
Browse files Browse the repository at this point in the history
  • Loading branch information
gcelano committed Jun 28, 2017
1 parent 508a448 commit f3976d5
Show file tree
Hide file tree
Showing 15 changed files with 5,825,532 additions and 0 deletions.
10 changes: 10 additions & 0 deletions uniqueTokens/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Unique Values

This directory contains a database of all the unique tokens found in

* https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/texts

A token is considered unique in its combination of word form AND POS tag (see @v in each <d/> element).
These unique values have been used to merge Morpheus and PerseusUnderPhilologic databases in order to retrieve lemmas from these latter
(which appear in <e/> and <l/> elements for Morpheus and PerseusUnderPhilologic lemmas respectively). PerseusUnderPhilologic lemmas also show the original POS tag in @r, in that this database presents a POS tag which is slightly different from the Morpheus one (in second position it adds one more value, thus resulting in a 10 character string).

Loading

0 comments on commit f3976d5

Please sign in to comment.