multi-classification in NLP based on Logistic regression, SVM, ConvNet and ResNet

This repo was constructed by ruiyang for the final project of machine learning class at NKU.
@ruiyangsong

Dependencies

version greater than the listed ones should also work.

|library                   version|
|---------------------------------|
|python                    3.6.9  |
|numpy                     1.16.5 |
|pandas                    0.25.1 |
|matplotlib                3.1.1  |
|scikit-learn              0.21.2 |
|jieba                     0.39   |
|gensim                    3.8.0  |
|tensorflow                1.12.0 |
|keras                     2.2.4  |

Directory tree

nlp:.
│  .gitignore
│  baidu_stopwords.txt
│  LICENSE
│  README.md
│  tree
│  userdict.txt
│          
├─data
│      data.csv
│      keywords.csv
│      mode_padding.npz
│      mode_stack.npz
│      mode_sum.npz
│      
├─fig
│      auto-encoder2Dim.png
│      data_hist.png
│      data_pie.png
│      network.emf
│      network.png
│      network.pptx
│      roc_auc.png
│      
├─log
│      compare.log
│      convnet_grid_search.log
│      convnet_padding_111_0.01.log
│      logistic_regression_grid_search.log
│      logistic_regression_sum_1000_0.1.log
│      resnet_grid_search.log
│      resnet_padding_81_0.01.log
│      svm_grid_search.log
│      svm_sum_1000_rbf.log
│      
├─model
│  ├─Conv1D
│  │  ├─convnet_mode_padding_epochs_111_lr_0.01_2020.06.12.08.49.55
│  │  │      history.dict
│  │  │      model.json
│  │  │      model.png
│  │  │      test_rst.npz
│  │  │      weightsFinal.h5
│  │  │      
│  │  └─resnet_mode_padding_epochs_81_lr_0.01_2020.06.12.08.50.57
│  │          history.dict
│  │          model.json
│  │          model.png
│  │          test_rst.npz
│  │          weightsFinal.h5
│  │          
│  ├─LR
│  │  └─mode_sum_maxiter_1000_lr_0.1
│  │          test_rst.npz
│  │          thetas.npz
│  │          
│  ├─svm
│  │  └─mode_sum_C_1000.0_kernel_rbf
│  │          test_rst.npz
│  │          
│  └─word2vec
│          dim100_window3_cnt1.model
│          
└─src
    │  auto_encoder.py
    │  compare.py
    │  convnet.py
    │  logistic_regression.py
    │  resnet.py
    │  svm.py
    │  utils.py
    │  word2vec.py

dataset

Train data set format are as blow, and test data set do not have labels (商品编码).

样本编号	商品名称	商品价格	商品编码
1	贝蒂斯双瓶礼盒橄榄油	42	101
2	充电强光灯灯珠	33	101
...	...	...	...
12238	转售电力收入	77	110

hist

pie

Usage

Before sinking here, make sure all the dependencies were correctly installed.
The project is organized by the following

split words (implement with jieba for Chinese words)
construct word vectors (based on skip-gram, implement with gensim)
select keywords by tf-idf weights
generate feature tensors
train and evaluate classiers

clone the repo and change directory to src

git clone https://github.com/ruiyangsong/nlp.git
cd nlp/src/

run step 1 to 4

python word2vec.py

Train classifiers

Logistic regression

python logistic_regression.py sum 1000 0.1

Support vector machine

python svm.py sum 1000 rbf

ConvNet

python convnet.py padding 111 0.01

ResNet with dilated convolutions

python resnet padding 81 0.01

Performance comparision

The ROC curve and AUC

ROC curve

The predicted scores (LR, SVM, ConvNet, ResNet)

########################################################################################
          |labels    |recall    |precision |F1        |mcc       |F1_micro  |mcc_micro |
LR        ------------------------------------------------------------------------------
          |c1        |0.3395    |0.4670    |0.3932    |0.3359    |0.4065    |0.3405    |
          |c2        |1.0000    |0.7391    |0.8500    |0.8392    |          |          |
          |c3        |0.0000    |0.0000    |0.0000    |0.0000    |          |          |
          |c4        |0.3466    |0.4350    |0.3858    |0.3269    |          |          |
          |c5        |0.0000    |0.0000    |0.0000    |0.0000    |          |          |
          |c6        |0.0000    |0.0000    |0.0000    |0.0000    |          |          |
          |c7        |0.0258    |0.2051    |0.0458    |0.0300    |          |          |
          |c8        |0.0000    |0.0000    |0.0000    |-0.0095   |          |          |
          |c9        |0.8282    |0.2281    |0.3577    |0.2202    |          |          |
          |c10       |1.0000    |0.9655    |0.9825    |0.9811    |          |          |
SVM       ------------------------------------------------------------------------------
          |c1        |0.9779    |0.8466    |0.9075    |0.8981    |0.5617    |0.5130    |
          |c2        |1.0000    |0.9931    |0.9966    |0.9961    |          |          |
          |c3        |0.1512    |0.2653    |0.1926    |0.1558    |          |          |
          |c4        |0.6534    |0.5359    |0.5889    |0.5400    |          |          |
          |c5        |0.6704    |0.4110    |0.5096    |0.4776    |          |          |
          |c6        |0.1014    |0.1829    |0.1304    |0.0957    |          |          |
          |c7        |0.1935    |0.2429    |0.2154    |0.1171    |          |          |
          |c8        |0.4504    |0.4910    |0.4698    |0.4149    |          |          |
          |c9        |0.3359    |0.3275    |0.3316    |0.2031    |          |          |
          |c10       |1.0000    |0.9949    |0.9975    |0.9972    |          |          |
ConvNet   ------------------------------------------------------------------------------
          |c1        |0.1550    |0.9545    |0.2667    |0.3638    |0.3734    |0.3037    |
          |c2        |1.0000    |0.7769    |0.8744    |0.8643    |          |          |
          |c3        |0.2384    |0.1990    |0.2169    |0.1527    |          |          |
          |c4        |0.2988    |0.6818    |0.4155    |0.4142    |          |          |
          |c5        |0.4972    |0.2673    |0.3477    |0.2959    |          |          |
          |c6        |0.5000    |0.0920    |0.1555    |0.0927    |          |          |
          |c7        |0.0742    |0.1933    |0.1072    |0.0453    |          |          |
          |c8        |0.3058    |0.3203    |0.3129    |0.2395    |          |          |
          |c9        |0.0282    |0.3333    |0.0520    |0.0556    |          |          |
          |c10       |1.0000    |1.0000    |1.0000    |1.0000    |          |          |
ResNet    ------------------------------------------------------------------------------
          |c1        |0.6089    |0.7534    |0.6735    |0.6421    |0.4485    |0.3873    |
          |c2        |0.3253    |1.0000    |0.4909    |0.5462    |          |          |
          |c3        |0.5581    |0.2060    |0.3009    |0.2575    |          |          |
          |c4        |0.5339    |0.6837    |0.5996    |0.5652    |          |          |
          |c5        |0.8547    |0.5134    |0.6415    |0.6297    |          |          |
          |c6        |0.2500    |0.0898    |0.1321    |0.0554    |          |          |
          |c7        |0.2742    |0.2640    |0.2690    |0.1607    |          |          |
          |c8        |0.2562    |0.5905    |0.3573    |0.3487    |          |          |
          |c9        |0.1949    |0.5468    |0.2873    |0.2598    |          |          |
          |c10       |1.0000    |0.9949    |0.9975    |0.9972    |          |          |
########################################################################################

network structure

left: ConvNet, right: ResNet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multi-classification in NLP based on Logistic regression, SVM, ConvNet and ResNet

Dependencies

Directory tree

dataset

Usage

clone the repo and change directory to src

run step 1 to 4

Train classifiers

Logistic regression

Support vector machine

ConvNet

ResNet with dilated convolutions

Performance comparision

The ROC curve and AUC

The predicted scores (LR, SVM, ConvNet, ResNet)

network structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
fig		fig
log		log
model		model
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baidu_stopwords.txt		baidu_stopwords.txt
tree		tree
userdict.txt		userdict.txt

License

ruiyangsong/nlp

Folders and files

Latest commit

History

Repository files navigation

multi-classification in NLP based on Logistic regression, SVM, ConvNet and ResNet

Dependencies

Directory tree

dataset

Usage

clone the repo and change directory to src

run step 1 to 4

Train classifiers

Logistic regression

Support vector machine

ConvNet

ResNet with dilated convolutions

Performance comparision

The ROC curve and AUC

The predicted scores (LR, SVM, ConvNet, ResNet)

network structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages