Skip to content

Fine tuning πŸ€— transformer model for softskill NER task

License

Notifications You must be signed in to change notification settings

AshutoshDongare/softskill-NER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mailing list : test License: CC BY-NC 4.0

Fine-tune a pretrained πŸ€— model for SoftSkill NER

header

This repo shows how to fine tune custom NER model to classify softskills using πŸ€— Huggingface pretrained model distilbert. The custom training data has some of the typical Softskills like "positive attitude", "leadership", "customer focus" etc.

We will Fine-tune the model for softskill NER using πŸ€— Transformers Trainer.

This is the simplest way to fine-tune a πŸ€— Transformer model. You can however choose to do this using pytorch and tensorflow way which gives you flexibility to write your own custom training loops if you require specific ways to train.

The custom dataset has around 119 sentences tokenized and annotated the way required by huggingface model for fine-tuning. (please drop me a line if you want to know how to prepare tokenized and annotated dataset for NER training)

This trained model still provides decent performance with such low number of training samples. It is also resilient enough to identify the softskills which are not in the training data.

For production use cases it is recommended to compile few hundreds to thousands of training samples.

Sample dataset format for token classification task is shown below.

{
 "id":"101", 
 "ner_tags" :[0,0,0,0,0,0,0,1,0,0,0,0,0], 
 "tokens":["a","good","project","manager","is","able","to","prioritize","from","the","list","of","tasks"]
 }

You may want to take a look at /data/train_ner.json to check which all softskills have been annotated in the training data.

Below is the metrics for this fine-tuning run

Metrics

Inference

The training script takes a sample sentence and runs inference on it to check whether the NER model is trained properly and it can perform softskill NER classification. Model is able to classify unseen softskills such as "composed" and "Professional".

  • NER = composed
  • NER = confident
  • NER = professional
  • NER = leadership

load saved model for inference

you may also load saved model in the same way you would use any pretrained πŸ€— Transformer model using pipeline.

Below is part of the code indicating how you can load saved model and run inference on it. Note that .from_pretrained() loads from the directory containing custom trained model.

model = AutoModelForTokenClassification.from_pretrained("./skillner_model/")
tokenizer = AutoTokenizer.from_pretrained("./skillner_model/")

NER_INFERENCE = pipeline("ner", model=model.to(device), tokenizer=tokenizer)

ner_results = NER_INFERENCE("your sentence for softskill NER inference")

Citations

This repo is based on Huggingface, compiled for Custom NER fine-tuning

Future enhancements

  • Compile and annotate more training data for NER. This can be achieved using Web or Wiki dump Scraping for the relevant data.
  • Implement chunking like B-SOFTSKILL/I-SOFTSKILL to recognize beginning of / is inside softskill entity.