1200字范文 > python提取关键字所在句子_科学网-Python提取句子-吕波的博文

python提取关键字所在句子_科学网-Python提取句子-吕波的博文

时间：2023-09-17 06:55:51

将一段话中的句子分离出来不是一件容易的事。因为句子的开头和结尾并不是很规则，而且句子内部会出现句号。这使得通过单一的正则表达式分离句子是不可能的。有时你能成功，但大多数时候你会出错。这里我们用nltk模块来做。

第一部分：使用正则表达式

import re

paragraph = "Mr. Smith bought for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't. I say. What's wrong with you? I am confused by your activity."

#匹配句尾的那个特殊空格，所有后面只能用依据空格用split分割

rule = pile(r"(?

result = re.split(rule, paragraph)

for sentence in result:

print sentence

#如果段落中含有双引号就报错。此时我们应该改用三双引号或三单引号，亲测有效。当然，正则表达式也需要变化。下面是利用正则表达式提取文本文件中的句子的代码。

import re

#open the txt file which must be in ANSI format

#TXT file in unicode format doesn't work. I don't why.

input = open('test.txt')

input_result = input.read()

rule = pile(r"(?