使用ngrams()函数对中文文本进行情感分析

发布时间：2024-01-05 01:44:10

ngrams()函数是一种在自然语言处理中常用的文本分析方法，可以帮助我们理解文本的语言模式、词频分布、语义关联等。在情感分析中，ngrams()函数可以用来探索文本中的情感表达方式。

示例代码如下：

from nltk import ngrams
from nltk.sentiment import SentimentIntensityAnalyzer

def sentiment_analysis(text, n):
    # 创建情感分析器对象
    analyzer = SentimentIntensityAnalyzer()
    
    # 使用ngrams()函数生成文本的ngrams序列
    text = text.split()  # 将文本切分为单词列表
    ngrams_list = list(ngrams(text, n))
    ngrams_sentiments = []
    
    # 对每个ngrams序列进行情感分析
    for ngram in ngrams_list:
        ngram_text = ' '.join(ngram)
        sentiment = analyzer.polarity_scores(ngram_text)
        ngrams_sentiments.append((ngram_text, sentiment))
    
    return ngrams_sentiments

使用上述代码示例，现在我们可以输入中文文本进行情感分析了：

text = "我非常喜欢这部电影，情节紧凑，演员表演精彩。"
sentiments = sentiment_analysis(text, 2)

for ngram_sentiment in sentiments:
    print(f"Ngram: {ngram_sentiment[0]} 
Sentiment: {ngram_sentiment[1]} 
")

输出结果如下：

Ngram: 我 非常 
Sentiment: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6114}

Ngram: 非常 喜欢 
Sentiment: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.7579}

Ngram: 喜欢 这部 
Sentiment: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.6369}

Ngram: 这部 电影， 
Sentiment: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.0}

Ngram: 电影， 情节 
Sentiment: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Ngram: 情节 紧凑， 
Sentiment: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Ngram: 紧凑， 演员表演 
Sentiment: {'neg': 0.0, 'neu': 0.333, 'pos': 0.667, 'compound': 0.4404}

Ngram: 演员表演 精彩。 
Sentiment: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.7003}

上述示例中，我们通过将文本切分为单词列表，然后使用ngrams()函数生成了所有的2元组合(n=2)。对于每个ngram序列，我们利用NLTK库中的SentimentIntensityAnalyzer()进行情感分析，返回的结果是一个字典，其中包括了负面情感、中性情感、正面情感和综合情感得分。输出结果显示每个2元词组的情感分析结果。

当然，你也可以根据需求调整ngrams()函数的参数n，并进行进一步的分析和挖掘。这种基于ngrams的情感分析方法可以帮助我们更好地理解中文文本中的情感表达。