Skip to content
This repository has been archived by the owner on Aug 14, 2019. It is now read-only.

Refactor out unnecessary processing in data pipeline #54

Open
nelson-liu opened this issue May 15, 2017 · 0 comments
Open

Refactor out unnecessary processing in data pipeline #54

nelson-liu opened this issue May 15, 2017 · 0 comments

Comments

@nelson-liu
Copy link
Owner

right now, the data pipeline will tokenize the input into both words / characters, even if you only want words. This is fine for now since character tokenization isn't that expensive, but it's not ideal for when we want to use NER/POS features, since running the taggers is can be quite slow and we don't want to do it unless necessary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant