GitHub - allanj/neural-partialCRF: Neural (LSTM) version of the partial CRF model

LSTM-CRF Model for Named Entity Recognition (or Sequence Labeling)

We implement an neural-based implementation of partial-crfsuite. Our implementation is based on the LSTM-CRF implementation by this project.

Requirements

Python >= 3.6 and PyTorch >= 0.4.1
AllenNLP package (if you use ELMo)

Usage

Put the Glove embedding file (glove.6B.100d.txt) under data directory (You can also use ELMo/BERT/Flair, Check below.) Note that if your embedding file does not exist, we just randomly initalize the embeddings.
Simply run the following command and you can obtain results comparable to the benchmark above.
```
python3.6 trainer.py
```

Running with your own data.

Create a folder YourData under the data directory.
Put the train.txt, dev.txt and test.txt files (make sure the format is compatible) under this directory. Remember to follow the dataset format: we use | to separate alternative labels at each position. Following is a sample format. We also focus on IOB encoding scheme.
```
EU B-ORG|B-MISC
rejects O
German B-MISC|B-PER
call O|B-PER
to O
boycott O
British B-MISC
lamb O
. O

Peter B-PER|B-ORG
Blackburn I-PER
```
Note: we would not have alternative labels for validation and test dataset. If you have a different format, simply modify the reader in config/reader.py.
Change the dataset argument to YourData in the main.py.

Using ELMo (and BERT)

There are two ways to import the ELMo and BERT representations. We can either preprocess the input files into vectors and load them in the program or use the ELMo/BERT model to forward the input tokens everytime. The latter approach allows us to fine tune the parameters in ELMo and BERT. But the memory consumption is pretty high. For the purpose of most practical use case, I simply implemented the first method.

Run the scripts under preprocess/get_elmo_vec.py. As a result, you get the vector files for your datasets.
Run the main file with command: python3.6 trainer.py --context_emb elmo. You are good to go.

For using BERT, it would be a similar manner. Let me know if you want further functionality. Note that, we concatenate ELMo and word embeddings (i.e., Glove) in our model (check here). You may not need concatenation for BERT.

Ongoing plan

Add an option for users to add label constraints. The way to do this now requires the users to modify the transition parameter matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
common		common
config		config
data/conll2003		data/conll2003
model		model
preprocess		preprocess
README.md		README.md
ner_predictor.py		ner_predictor.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSTM-CRF Model for Named Entity Recognition (or Sequence Labeling)

Requirements

Usage

Running with your own data.

Using ELMo (and BERT)

Ongoing plan

About

Releases

Packages

Languages

allanj/neural-partialCRF

Folders and files

Latest commit

History

Repository files navigation

LSTM-CRF Model for Named Entity Recognition (or Sequence Labeling)

Requirements

Usage

Running with your own data.

Using ELMo (and BERT)

Ongoing plan

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages