GitHub - PratikBarhate/question-classification: Question Classification for the dataset CogComp QC Dataset

question-classification

Classifier for the question classification dataset (UIUC's CogComp QC Dataset).

Results from the empirical tests carried out, are in results file. All the results are for coarse:fine, combined prediction class out of the total 50 classes, if not stated otherwise.
More details about the execution/logic is available in execution details.
Diagrammatic representation of the data flow can be accessed here.

The data-flow is different for Neural Network, its only a single coarse model predicting for all 50 different classes.

Install

Python 3.6.3 required. See requirements.txt for the list of other dependencies or use pip (see below).

Example Linux setup using pyenv to install an older Python version and venv for installing dependencies inside the project dir:

# Install and select Python 3.6.3
pyenv install -v 3.6.3
pyenv local 3.6.3

# Create a project specific virtual envirtonment for installing dependencies
python -m venv venv
source venv/bin/activate

# Update pip and install required dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Download the english langiage model
python -m spacy download en_core_web_lg

Execution

`bin` executable files are helpful only for Linux or macOS users. Microsoft users please execute the python modules by matching your requirement from the shell scripts.

Check your systems' text encoding scheme. It is set to text_file_encoding = "utf8", can be changed in qc.utils.file_ops.py.

Go to the project directory.
We need to execute the command ./bin/qc.sh nlp first.
Once the Natural Language Processing (NLP) is done for computing annotated natural language property we can train one of the models.
To train a model execute command ./bin/qc.sh train {ml_algo_model}. e.g ./bin/qc.sh train svm
To test a model execute command ./bin/qc.sh test {ml_algo_model}.

All the trained models are saved inside a folder named - ${ml_algo_model}, inside the project's root directory.

Machine learning algorithms implemented - {ml_algo_model}

svm = Support Vector Machine
lr = Logistic Regression
linear_svm = Linear Support Vector Classifier (Machine)
nn = Neural Network

To clean the outputs

./bin/cleanup.sh nlp - This will delete all the NLP related data.
./bin/cleanup.sh all_models - This will delete all the pre-trained models.
./bin/cleanup.sh model ${ml_algo_model} - This will delete the specific ML model which was pre-trained.
./bin/cleanup.sh all - This will delete all the computed data.

all_models will not clean the additional model defined by you. It will only clean the models mentioned above.

Experimental Code

The method to convert text data to ML features can be modified in function qc.dataprep.text_features.get_vect. code location
The feature stack (what all data is to be feed to ML algorithm) can be modified/transformed/generated in file qc.dataprep.feature_stack. code location

These (point 1, 2) changes are used whenever you execute training process again. There is no need to execute nlp step again.
Machine learning algorithms can be added in function qc.ml.train.train_one_node. code location (Parameter tuning too can be done). e.g In the experimental part of the code add extra elif statement
```
elif == {your_model_name}:
    machine = {Initialize the algorithm you want to use}
```
```
elif ml_algo == "lr_lsvm":
    if cat_type == "coarse":
        machine = linear_model.LogisticRegression(solver="newton-cg")
    else:
        machine = svm.LinearSVC()
```
While executing, use the shell command ./bin/qc.sh train lr_lsvm, and this command will use the model defined by you. lr_lsvm is {your_model_name}. In the example we have defined to use LogisticRegression for coarse class prediction and LinearSVC for fine class predictions (all of the fine class predictions).

NOTE:

1. Tab = 4 spaces

2. command `python` should point to the installation following the above mentioned dependencies

3. Or you can change the command in the shell script `qc.sh` to the suitable python command.

python -m {operation} ---> python3 -m {operation}

License

MIT

Credits

This project has been inspired from one of the problem we tried to solve - understanding the question for our QA bot. In a project named Invoker, I did work with Akash Pateria, we worked together in the undergraduate capstone project. We did use python - v2.7, practNLPtools, and LinearSVC, as the ML algorithm, for our tasks in the project Invoker.

This project aims at exploring more options to process Natural Language (English), test with various combinations of features and improve the accuracy.

References

High-Performance Question Classification Using Semantic Features

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
bin		bin
dataset		dataset
documentation		documentation
qc		qc
resources		resources
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

question-classification

Install

Execution

`bin` executable files are helpful only for Linux or macOS users. Microsoft users please execute the python modules by matching your requirement from the shell scripts.

Machine learning algorithms implemented - {ml_algo_model}

To clean the outputs

Experimental Code

NOTE:

1. Tab = 4 spaces

2. command `python` should point to the installation following the above mentioned dependencies

3. Or you can change the command in the shell script `qc.sh` to the suitable python command.

python -m {operation} ---> python3 -m {operation}

License

Credits

References

About

Releases

Packages

Contributors 2

Languages

License

PratikBarhate/question-classification

Folders and files

Latest commit

History

Repository files navigation

question-classification

Install

Execution

bin executable files are helpful only for Linux or macOS users. Microsoft users please execute the python modules by matching your requirement from the shell scripts.

Machine learning algorithms implemented - {ml_algo_model}

To clean the outputs

Experimental Code

NOTE:

1. Tab = 4 spaces

2. command python should point to the installation following the above mentioned dependencies

3. Or you can change the command in the shell script qc.sh to the suitable python command.

python -m {operation} ---> python3 -m {operation}

License

Credits

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`bin` executable files are helpful only for Linux or macOS users. Microsoft users please execute the python modules by matching your requirement from the shell scripts.

2. command `python` should point to the installation following the above mentioned dependencies

3. Or you can change the command in the shell script `qc.sh` to the suitable python command.

Packages