在Python中使用STOPWORDS过滤中文词云中的常用词

发布时间：2023-12-25 04:47:10

使用Python中的STOPWORDS过滤中文词云中的常用词可以通过以下步骤实现：

步骤1：导入所需的库

首先，我们需要导入所需的库：jieba用于中文分词，wordcloud用于生成词云图，matplotlib用于显示词云图，以及STOPWORDS用于过滤常用词。

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from wordcloud import STOPWORDS

步骤2：读取文本文件

接下来，我们需要读取要生成词云图的文本文件。这里我们假设要生成词云图的文本文件为"chinese_text.txt"。

with open('chinese_text.txt', 'r', encoding='utf-8') as f:
    text = f.read()

步骤3：中文分词

使用jieba库对文本进行中文分词。

seg_list = jieba.cut(text, cut_all=False)
words = ' '.join(seg_list)

步骤4：创建词云对象

创建一个词云对象，并设置一些基本参数，如字体、背景颜色等。

stopwords = set(STOPWORDS)  # 使用STOPWORDS过滤常用词
wordcloud = WordCloud(font_path='simhei.ttf',
                      background_color='white',
                      width=800,
                      height=600,
                      stopwords=stopwords).generate(words)

步骤5：显示词云图

使用matplotlib库显示词云图。

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

完整代码如下：

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from wordcloud import STOPWORDS

with open('chinese_text.txt', 'r', encoding='utf-8') as f:
    text = f.read()

seg_list = jieba.cut(text, cut_all=False)
words = ' '.join(seg_list)

stopwords = set(STOPWORDS)  # 使用STOPWORDS过滤常用词
wordcloud = WordCloud(font_path='simhei.ttf',
                      background_color='white',
                      width=800,
                      height=600,
                      stopwords=stopwords).generate(words)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

以上就是使用Python中的STOPWORDS过滤中文词云中的常用词的例子。