Skip to content

laurentmih/mitie__rasanlu_CONLL_2003

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

I wanted to benchmark the total_word_feature_extractor.dat that I had trained using MITIE. In order to do so, I decided to benchmark it on the CoNLL2003 Dutch NER task. These files read in the CoNLL training sets, and convert them to a format useable by Rasa NLU. Subsequently, I trained a model in Rasa, and ran it on the test set. I achieved equivalent performance to the CoNLL 2003 winners.

Components

readlines.py

Parses the CoNLL data format, and converts it normal sentences, output in lines_output.json. Use this to get a better overview of the data in the CoNLL files.

main.py

Reads the CoNLL training file and converts it to a JSON format, output in traindata.json. This format is readable by RASA NLU. The file can then be used to train a model with Rasa NLU, and subsequently test it on the test-sets.

About

Testing my trained word feature vector from MITIE using Rasa NLU on the Dutch CoNLL2003 NER task

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages