We implement the cross domain Chinese hedge detection using Keras.
This dataset contains four domains: wiki, biomedical abstract, discuss and result.
For example, we use the abstract to train and test for the wiki, this is called wiki_by_abstract (test_by_train).
We use five fold test to evaluate our model.
We train our word embedding on a small corpus downloaded from the Medline, we will make our word embedding public as soon as possible.
We would like to make our data public as soon as possible.
- python 2.7
- Keras 2.0.1
- Tensorflow 1.0.1
- nltk 3.2.2
- tqdm
To creature the data and features
python hedge_process.py
To process the data into the matrix and use for learning
python process_data.py
To run the BiLSTM model
python main.py