Releases: mideind/GreynirEngine
Releases · mideind/GreynirEngine
Version 1.8.1
- Fixes bug in
bintokenizer.py
which could lead to wrong disambiguation of words within phrases defined in the[ambiguous_phrases]
section inPhrases.conf
. This could cause strange parse results. - Many additions and enhancements in
Reynir.grammar
.
Version 1.8.0
- Added
accusative_np
,dative_np
andgenitive_np
properties onSimpleTree
instances, to convert noun phrases between cases. This inter alia enables correct declension of noun phrases to be shown when checking grammar in text, cf. the ReynirCorrect package. - Tuning and enhancement of the context-free grammar in
Reynir.grammar
Version 1.7.1
- Changes to
SimpleTree
parsing schema, includingVP
andTO
nonterminals. See documentation for details. - Greatly expanded vocabulary, with words such as rímix, blörraður, vegu and skjöldu.
- Support for passing options from the
Reynir
instance on to the tokenizer. - Many enhancements and corrections in the grammar.
- Better support for derived parser classes that add their own fragments to the base grammar, such as for queries.
Version 1.6.0
- Many grammar improvements
- Added conditional section support in
.grammar
files, with sections delimited by$if()...$endif()
- Added several grammar error checking rules in
Reynir.grammar
- Added support for domains and hashtags to grammar, as noun phrases
- Lower-priority productions can now be added separately to nonterminals via
>
syntax, which is useful for grammar error rules - Several additions to vocabulary, including fertugsafmæli, heljarinnar and hæstlaunaður
Version 1.5.3
- Better handling of complex composites ("dómsmála-, ferðamála-, nýsköpunar- og iðnaðarráðherra")
- More robust handling of impersonal verbs with arguments ("mig dreymdi kött")
- Updated word lists (
ordalisti-first, -last, -all.dawg.bin
) for compound word builder - Added
NP-COMPANY
nonterminal for company names to simplified trees - Updated GitHub repository URLs to
mideind
instead ofvthorsteinsson
Version 1.5.2
Added several compound words to vocabulary. Grammar fixes. No changes to the package API.
Version 1.5.1
- Fixed handling of family names, especially those that can also be given names (Hafstein)
- Added functionality for the benefit of
ReynirCorrect
- Better support for multiple parsers using multiple grammars simultaneously
- Better support for addresses, especially those with a number+letter (Laugavegur 21B)
- Vocabulary fixes, including fix for BÍN error in Landmannalaugar
Version 1.5.0
- Added
parent
,enclosing_tag()
,index
,tidy_text
andspan
toSimpleTree
class. - Included
ord.auka.csv
andsystematic_additions.csv
files, used when building the vocabulary inord.compressed
. Reynir.tokenize()
is now the function frombintokenizer.py
instead of the one from theTokenizer
package.BinErrata.conf
file now controls both edits and deletions from BÍN when building theord.compressed
vocabulary file.- The default tokenization pipeline in
bintokenizer.py
has been enhanced to support theReynirCorrect
package. - Better support for parsing and detecting errors in the use of impersonal verbs.
- Added
S-HEADING
andVP-REV
nonterminals to simplified parse trees. - Added
_Sentence.terminal_nodes
attribute for retrieving a list of terminal nodes in a parse tree for a sentence. - Lots of additional support for place names and geographical information.
Version 1.4.1
All abbreviation handling moved to the Tokenizer
package. Moved large resources, such as the ord.compressed
file, to git-lfs
(Git Large File Storage). Made the tokenizer used by the Reynir
class overridable.
Version 1.4.0
Improved handling of adjective arguments (tengdur flokknum, viðstödd sýninguna, frjáls ferða sinna). Stricter and more accurate parsing of impersonal verbs; Ég dreymi fílinn no longer parses. Added kind
, cat
, fl
and leaves
attributes to SimpleTree
. Added config/BinErrata.conf
with almost 1.600 corrections to the BÍN fl
data, which are applied when building the ord.compressed
file.