Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
CNEC		CNEC
.gitignore		.gitignore
README.md		README.md

Repository files navigation

my-nlp

中文垃圾邮件分类实战

数据集分为：ham_data.txt 和 Spam.data.txt , 对应为正常邮件和垃圾邮件

其中每行代表着一个邮件

主要过程为：

数据提取
对数据进行归整化和预处理
提取特征（tfidf 和词袋模型）
训练分类器
- 基于词袋模型的多项式朴素贝叶斯
- 基于词袋模型的逻辑回归
- 基于词袋模型的支持向量机
- 基于 tfidf 的多项式朴素贝叶斯
- 基于 tfidf 的逻辑回归
- 基于 tfidf 的支持向量机
用准确率(Precision)、召回率(Recall)、F1测度来评价模型

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%