Skip to content

Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

License

Notifications You must be signed in to change notification settings

egrcc/Cross-Domain-CWS

Repository files navigation

Cross-Domain-CWS

About

A TensorFlow implementation of IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation".

Model Overview

model

Figure: Architecture of our proposed model. It mainly includes three components: the forward language model (pink), backward language model (yellow), and BiLSTM segmentation model (blue). We use a gate mechanism to control the influence of the language models on the segmentation model. The outputs of language models are not shown for simplicity. In this example, we assume that “c1c2c3” is a word.

Requirements

  • Python: 2.7
  • TensorFlow >= 1.4.1 (The used version for experiments in our paper is 1.4.1)

How to run

  1. Bulid vocabulary:

    python utils_data.py
  2. Train a model:

    python train.py --model lstmlm --source ctb --target zx --pl True --memory 1.0
  3. Test a model:

    python test.py --model lstmlm --source ctb --target zx --pl True --memory 1.0
  4. Evaluate a model:

    python eval.py ctb zx lstmlm_ctb_True

Citation

If you find the code helpful, please cite the following paper:

Lujun Zhao, Qi Zhang, Peng Wang and Xiaoyu Liu, Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation, In Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18), July 9-19, 2018, Stockholm, Sweden.

@InProceedings{zhao2018cws,
  author    = {Zhao, Lujun and Zhang, Qi and Wang, Peng and Liu, Xiaoyu},
  title     = {Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation},
  booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI-18)},
  year      = {2018},
  address   = {Stockholm, Sweden}
}

About

Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published