Skip to content

Latest commit

 

History

History

msra_ner_example

中文说明 | English

This example demonstrates distilling a Chinese-ELECTRA-base model on the MSRA NER task with distributed data-parallel training(single node, muliti-GPU).

  • ner_ElectraTrain_dist.sh : trains a treacher model (Chinese-ELECTRA-base) on MSRA NER.
  • ner_ElectraDistill_dist.sh : distills the teacher to a ELECTRA-small model.

Set the following variables in the shell scripts before running:

  • ELECTRA_DIR_BASE : where Chinese-ELECTRA-base locates, should includ vocab.txt, pytorch_model.bin and config.json.

  • OUTPUT_DIR : this directory stores the logs and the trained model weights.

  • DATA_DIR : it includes MSRA NER dataset:

    • msra_train_bio.txt
    • msra_test_bio.txt

For distillation:

  • ELECTRA_DIR_SMALL : where the pretrained Chinese-ELECTRA-small weight locates, should include pytorch_model.bin. This is optional. If you don't provide the ELECTRA-small weight, the student model will be initialized randomly.
  • student_config_file : the model config file (i.e., config.json) for the student. Usually it should be in ${ELECTRA_DIR_SMALL}.
  • trained_teacher_model_file : the ELECTRA-base teacher model that has been fine-tuned.

The scripts have been tested under PyTorch==1.2, Transformers==2.8.