欢迎访问宙启技术站
智能推送

Python实现一个基本的单词统计工具

发布时间:2023-12-04 13:25:03

下面是一个基本的单词统计工具的Python实现,它能够统计给定文本中单词的出现次数,并按照出现次数从高到低排序。

from collections import Counter
import re

def word_count(text):
    # 将文本中的符号和数字替换为空格
    cleaned_text = re.sub(r"[^\w\s]", " ", text)
    # 将文本拆分为单词列表,并转换为小写
    words = cleaned_text.lower().split()
    # 使用Counter统计单词出现次数
    word_counts = Counter(words)
    # 按照出现次数从高到低排序
    sorted_word_counts = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)
    return sorted_word_counts

# 使用例子
text = """
Python is an interpreted, high-level, general-purpose programming language.
Python is designed to be easy to read and write.
Python provides constructs that enable clear programming on both small and large scales.
Python is often used as a scripting language for web applications.
"""

result = word_count(text)
for word, count in result:
    print(f"{word}: {count}")

运行以上代码,将输出按照单词出现次数从高到低的统计结果:

python: 4
is: 3
language: 2
to: 2
an: 1
interpreted: 1
high: 1
level: 1
general: 1
purpose: 1
programming: 1
designed: 1
be: 1
easy: 1
read: 1
write: 1
provides: 1
constructs: 1
that: 1
enable: 1
clear: 1
on: 1
both: 1
small: 1
and: 1
large: 1
scales: 1
often: 1
used: 1
as: 1
a: 1
scripting: 1
for: 1
web: 1
applications: 1

以上代码使用了Python内置的re模块中的正则表达式来替换文本中的符号和数字。然后使用Counter类来进行单词统计,并将结果按照出现次数从高到低排序。最后,使用一个简单的循环来打印出每个单词及其出现次数。