Refactor out unnecessary processing in data pipeline #54

nelson-liu · 2017-05-15T00:40:15Z

right now, the data pipeline will tokenize the input into both words / characters, even if you only want words. This is fine for now since character tokenization isn't that expensive, but it's not ideal for when we want to use NER/POS features, since running the taggers is can be quite slow and we don't want to do it unless necessary.

nelson-liu added enhancement help wanted labels May 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor out unnecessary processing in data pipeline #54

Refactor out unnecessary processing in data pipeline #54

nelson-liu commented May 15, 2017

Refactor out unnecessary processing in data pipeline #54

Refactor out unnecessary processing in data pipeline #54

Comments

nelson-liu commented May 15, 2017