Fine-tune a pretrained 🤗 model for SoftSkill NER

This repo shows how to fine tune custom NER model to classify softskills using 🤗 Huggingface pretrained model distilbert. The custom training data has some of the typical Softskills like "positive attitude", "leadership", "customer focus" etc.

We will Fine-tune the model for softskill NER using 🤗 Transformers Trainer.

This is the simplest way to fine-tune a 🤗 Transformer model. You can however choose to do this using pytorch and tensorflow way which gives you flexibility to write your own custom training loops if you require specific ways to train.

The custom dataset has around 119 sentences tokenized and annotated the way required by huggingface model for fine-tuning. (please drop me a line if you want to know how to prepare tokenized and annotated dataset for NER training)

This trained model still provides decent performance with such low number of training samples. It is also resilient enough to identify the softskills which are not in the training data.

For production use cases it is recommended to compile few hundreds to thousands of training samples.

Sample dataset format for token classification task is shown below.

{
 "id":"101", 
 "ner_tags" :[0,0,0,0,0,0,0,1,0,0,0,0,0], 
 "tokens":["a","good","project","manager","is","able","to","prioritize","from","the","list","of","tasks"]
 }

You may want to take a look at /data/train_ner.json to check which all softskills have been annotated in the training data.

Below is the metrics for this fine-tuning run

Inference

The training script takes a sample sentence and runs inference on it to check whether the NER model is trained properly and it can perform softskill NER classification. Model is able to classify unseen softskills such as "composed" and "Professional".

NER = composed
NER = confident
NER = professional
NER = leadership

load saved model for inference

you may also load saved model in the same way you would use any pretrained 🤗 Transformer model using pipeline.

Below is part of the code indicating how you can load saved model and run inference on it. Note that .from_pretrained() loads from the directory containing custom trained model.

model = AutoModelForTokenClassification.from_pretrained("./skillner_model/")
tokenizer = AutoTokenizer.from_pretrained("./skillner_model/")

NER_INFERENCE = pipeline("ner", model=model.to(device), tokenizer=tokenizer)

ner_results = NER_INFERENCE("your sentence for softskill NER inference")

Citations

This repo is based on Huggingface, compiled for Custom NER fine-tuning

Future enhancements

Compile and annotate more training data for NER. This can be achieved using Web or Wiki dump Scraping for the relevant data.
Implement chunking like B-SOFTSKILL/I-SOFTSKILL to recognize beginning of / is inside softskill entity.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
LICENSE		LICENSE
README.md		README.md
softskill_ner_training.ipynb		softskill_ner_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tune a pretrained 🤗 model for SoftSkill NER

Below is the metrics for this fine-tuning run

Inference

load saved model for inference

Citations

Future enhancements

About

Releases

Packages

Languages

License

AshutoshDongare/softskill-NER

Folders and files

Latest commit

History

Repository files navigation

Fine-tune a pretrained 🤗 model for SoftSkill NER

Below is the metrics for this fine-tuning run

Inference

load saved model for inference

Citations

Future enhancements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages