Markov-Speaking
Markov-Speaking copied to clipboard
一个使用马尔科夫链算法构建中/英文语句的类,提供了解析文本和生成语言的接口
Markov Speaking (马尔科夫链随机文本生成)
This project is an package generating random sentences by markov chain. If you have any problems/ideas, please email me, or open your PR. I feel honored to learn from your help.
Platform
The markov_speaking.py is written in Python 2.7, using jieba, codecs, random and re. You need to install jieba by pip2 install jieba.
Usage
- The
markov_speaking.pyprovides a classMarkov, theinitofMarkovis__init__(self, filepath = None, mode = 0, coding="utf8").filepathis the file you want to parse, and the sentences the class build will base on this file.modeis 0 if you want to parse English, and 1 if Chinese.codingassigns the codec, default is UTF-8. - Let
pbe a instance ofMarkov, you can usep.train(self, filepath = '', mode = 0, coding="utf8")to regenerate the instance. - After you have built
pand trained, you can usep.say(length)to generate a random sentence. The length is the max length of sentence to generate, default is 10.
Examples For Use
You can download the Chinese novel 《笑傲江湖》 from here, or English novel 《The Standard Bearer》 from here.
>>> import markov_speaking
>>> p = markov_speaking.Markov('swords.txt', 1)
Building prefix dict from the default dictionary ...
Loading model from cache /home/forec/cache
Dumping model to file cache /home/forec/cache
Loading model cost 1.578 seconds.
Prefix dict has been built succesfully.
>>> p.say(5)
忽然想到一计说道师伯令狐师兄行侠仗义。
Update-logs
- 2016-10-10: Add project and build repository.
- 2016-10-11: Fix problems in English part: Not split words by sentences.
- 2016-10-12: Fix train function.
- 2016-10-13: Remove useless chinese upper condition.
License
All codes in this repository are licensed under the terms you may find in the file named "LICENSE" in this directory.