Python中的正则表达式函数re详解

发布时间：2023-06-01 07:45:37

在Python中，正则表达式是常用的字符串匹配工具，通过re库可以实现对字符串的搜索、替换和分割等操作。本文将详细介绍Python中的正则表达式函数re。

re库中主要的函数：

1. re.match(pattern, string, flags=0)

功能：从开始位置开始匹配字符串，如果开头不匹配，则匹配失败；匹配成功返回对象，失败返回None。

参数说明：

· pattern：正则表达式。

· string：待匹配的字符串。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

2. re.search(pattern, string, flags=0)

功能：从字符串任意位置开始匹配，只要有一个位置匹配成功就返回匹配对象，否则返回None。

参数说明：

· pattern：正则表达式。

· string：待匹配的字符串。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

3. re.findall(pattern, string, flags=0)

功能：搜索整个字符串，以列表的形式返回所有匹配的子串。

参数说明：

· pattern：正则表达式。

· string：待匹配的字符串。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

4. re.split(pattern, string, maxsplit=0, flags=0)

功能：根据正则表达式进行字符串分割，并返回分割后的字符串列表。

参数说明：

· pattern：正则表达式。

· string：待匹配的字符串。

· maxsplit：最大分割次数。默认为0，表示所有匹配的子串都会被分割。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

5. re.sub(pattern, repl, string, count=0, flags=0)

功能：根据正则表达式搜索字符串，并将匹配的子串替换为指定的字符串。

参数说明：

· pattern：正则表达式。

· repl：替换字符串。

· string：待匹配的字符串。

· count：最大替换次数。默认为0，表示所有匹配的子串都会被替换。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

6. re.compile(pattern, flags=0)

功能：将正则表达式编译成Pattern对象，用于匹配字符串。

参数说明：

· pattern：正则表达式。

· flags：匹配模式，可以是多个标志位组合，如re.MULTILINE|re.IGNORECASE。

常用的匹配模式标志：

1. re.IGNORECASE 或 re.I

功能：忽略大小写。

使用方式：re.compile(pattern, re.IGNORECASE)。

2. re.MULTILINE 或 re.M

功能：多行匹配。

使用方式：re.compile(pattern, re.MULTILINE)。

3. re.DOTALL 或 re.S

功能：可以匹配所有字符，包括换行符。

使用方式：re.compile(pattern, re.DOTALL)。

4. re.ASCII 或 re.A

功能：只匹配ASCII字符。

使用方式：re.compile(pattern, re.ASCII)。

5. re.UNICODE 或 re.U

功能：匹配Unicode字符。

使用方式：re.compile(pattern, re.UNICODE)。

6. re.VERBOSE 或 re.X

功能：可以在正则表达式中添加注释。

使用方式：re.compile(pattern, re.VERBOSE)。

示例代码：

import re

# re.match示例
pattern = r'hello'
string = 'hello world'
match_obj = re.match(pattern, string)
if match_obj:
    print('match:', match_obj.group())
else:
    print('match failed.')

# re.search示例
pattern = r'hello'
string = 'world hello'
search_obj = re.search(pattern, string)
if search_obj:
    print('search:', search_obj.group())
else:
    print('search failed.')

# re.findall示例
pattern = r'\d+'
string = '123456789'
findall_list = re.findall(pattern, string)
print('findall:', findall_list)

# re.split示例
pattern = r'\s+'
string = 'hello world'
split_list = re.split(pattern, string)
print('split:', split_list)

# re.sub示例
pattern = r'hello'
repl = 'hi'
string = 'hello world'
sub_string = re.sub(pattern, repl, string)
print('sub:', sub_string)

# re.compile示例
pattern = r'\d+'
string = '123456789'
compile_obj = re.compile(pattern)
match_obj = compile_obj.match(string)
if match_obj:
    print('compiled match:', match_obj.group())
else:
    print('compiled match failed.')

总结：re正则表达式函数是Python中常用的字符串匹配工具，在解析文本文件和网络数据中得到广泛的应用，掌握这些函数对于编写高效的Python程序是非常有必要的。