Skip to content

一个使用马尔科夫链算法构建中/英文语句的类,提供了解析文本和生成语言的接口

License

Notifications You must be signed in to change notification settings

Forec/Markov-Speaking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Markov Speaking (马尔科夫链随机文本生成)

This project is an package generating random sentences by markov chain. If you have any problems/ideas, please email me, or open your PR. I feel honored to learn from your help.

Platform

The markov_speaking.py is written in Python 2.7, using jieba, codecs, random and re. You need to install jieba by pip2 install jieba.

Usage

  • The markov_speaking.py provides a class Markov, the init of Markov is __init__(self, filepath = None, mode = 0, coding="utf8"). filepath is the file you want to parse, and the sentences the class build will base on this file. mode is 0 if you want to parse English, and 1 if Chinese. coding assigns the codec, default is UTF-8.
  • Let p be a instance of Markov, you can use p.train(self, filepath = '', mode = 0, coding="utf8") to regenerate the instance.
  • After you have built p and trained, you can use p.say(length) to generate a random sentence. The length is the max length of sentence to generate, default is 10.

Examples For Use

You can download the Chinese novel 《笑傲江湖》 from here, or English novel 《The Standard Bearer》 from here.

>>> import markov_speaking
>>> p = markov_speaking.Markov('swords.txt', 1)
Building prefix dict from the default dictionary ...
Loading model from cache /home/forec/cache
Dumping model to file cache /home/forec/cache
Loading model cost 1.578 seconds.
Prefix dict has been built succesfully.
>>> p.say(5)
忽然想到一计说道师伯令狐师兄行侠仗义

Update-logs

  • 2016-10-10: Add project and build repository.
  • 2016-10-11: Fix problems in English part: Not split words by sentences.
  • 2016-10-12: Fix train function.
  • 2016-10-13: Remove useless chinese upper condition.

License

All codes in this repository are licensed under the terms you may find in the file named "LICENSE" in this directory.

授权声明

我已授权实验楼使用此仓库中的代码并发表此项目教程,你可以在这里查看对应的教程

About

一个使用马尔科夫链算法构建中/英文语句的类,提供了解析文本和生成语言的接口

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages