NLG for Fun - Python快速自动标题生成器实例
模块:Markovify
我们在这里使用的Py模块是markovify。
Markovify的描述:
Markovify是一个简单的,可扩展的马尔可夫链发生器。目前,它的主要用途是构建大型文本语料库的马尔可夫模型,并从中产生随机语句。但是,从理论上讲,它可以用于其他应用程序。
关于数据集:
个数据集可以从Kaggle数据集中下载
加载必需的包
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import markovify #Markov Chain Generator
# Any results you write to the current directory are saved as output.
读取输入文本文件
inp = pd.read_csv('../input/abcnews-date-text.csv')
inp.head(3)
publish_date headline_text
020030219 aba decides against community broadcasting lic…
120030219 act fire witnesses must be aware of defamation
220030219a g calls for infrastructure protection summit
用马尔可夫链建立文本模型
text_model = markovify.NewlineText(inp.headline_text,state_size = 2)
自动生成的标题
# Print five randomly-generated sentences
for i in range(5):
print(text_model.make_sentence())
iron magnate poised to storm cleanup
meet the png government defends stockdale appointment
the twitter exec charged with animal cruelty trial
pm denies role in pregnancy
shoalhaven business boosts hunter