Python正则表达式函数：10个示例实现更高效的文本处理

发布时间：2023-06-03 16:33:59

Python正则表达式是一种强大的处理文本的工具，它可以实现高效的文本搜索和替换。本文将介绍Python中常用的10个正则表达式函数，包括re.search()、re.findall()、re.sub()等等，通过这些函数的使用，可以更加方便快速地实现文本处理任务。

1. re.search()

re.search()函数用于在一个字符串中搜索匹配正则表达式模式的个位置，并返回一个MatchObject对象，如果匹配失败，则返回None。

基本语法：

re.search(pattern, string, flags=0)

参数说明：

pattern：正则表达式模式

string：要搜索的字符串

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "hello world!"

pattern = "world"

match = re.search(pattern, text)

if match:

print("Found at position ", match.start())

else:

print("Not found")

输出结果：

Found at position 6

2. re.findall()

re.findall()函数用于从一个字符串中查找匹配正则表达式模式的所有子串，并返回一个列表。

基本语法：

re.findall(pattern, string, flags=0)

参数说明：

pattern：正则表达式模式

string：要搜索的字符串

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\w+"

matches = re.findall(pattern, text)

print(matches)

输出结果：

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

3. re.finditer()

re.finditer()函数与re.findall()函数类似，都是用于从一个字符串中查找匹配正则表达式模式的所有子串，但是它返回的是一个迭代器，可以遍历所有的匹配结果。

基本语法：

re.finditer(pattern, string, flags=0)

参数说明：

pattern：正则表达式模式

string：要搜索的字符串

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\w+"

matches = re.finditer(pattern, text)

for match in matches:

print(match.group())

输出结果：

The

quick

brown

fox

jumps

over

the

lazy

dog

4. re.split()

re.split()函数用于按照正则表达式模式将字符串拆分成若干个子串，并返回一个列表。

基本语法：

re.split(pattern, string, maxsplit=0, flags=0)

参数说明：

pattern：正则表达式模式

string：要拆分的字符串

maxsplit：可选参数，最大拆分次数，如果指定为0，则表示拆分所有匹配的子串。

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\s"

words = re.split(pattern, text)

print(words)

输出结果：

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

5. re.sub()

re.sub()函数用于在一个字符串中查找正则表达式模式的匹配子串，并将其替换为指定的字符串，并返回替换后的字符串。

基本语法：

re.sub(pattern, repl, string, count=0, flags=0)

参数说明：

pattern：正则表达式模式

repl：用于替换匹配子串的字符串

string：要搜索的字符串

count：可选参数，指定替换的最大次数，如果指定为0，则表示替换所有匹配的子串。

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\s"

new_text = re.sub(pattern, "-", text, count=1)

print(new_text)

输出结果：

The-quick brown fox jumps over the lazy dog

6. re.subn()

re.subn()函数与re.sub()函数类似，都是用于替换匹配子串，但是它返回的是一个元组，元组的个元素是替换后的字符串，第二个元素是替换的次数。

基本语法：

re.subn(pattern, repl, string, count=0, flags=0)

参数说明：

pattern：正则表达式模式

repl：用于替换匹配子串的字符串

string：要搜索的字符串

count：可选参数，指定替换的最大次数，如果指定为0，则表示替换所有匹配的子串。

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\s"

new_text, count = re.subn(pattern, "-", text, count=1)

print(new_text)

print(count)

输出结果：

The-quick brown fox jumps over the lazy dog

7. re.escape()

re.escape()函数用于将字符串中的特殊字符转义，使其可以被正则表达式识别。

基本语法：

re.escape(string)

参数说明：

string：要转义的字符串

示例代码：

import re

text = "The quick (brown) fox jumps over the lazy dog"

pattern = "(brown)"

escaped_pattern = re.escape(pattern)

match = re.search(escaped_pattern, text)

if match:

print("Found at position ", match.start())

else:

print("Not found")

输出结果：

Found at position 11

8. re.fullmatch()

re.fullmatch()函数用于检查字符串是否与正则表达式模式完全匹配，并返回一个MatchObject对象，如果匹配失败，则返回None。

基本语法：

re.fullmatch(pattern, string, flags=0)

参数说明：

pattern：正则表达式模式

string：要匹配的字符串

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text1 = "The quick brown fox"

text2 = "The quick brown fox jumps over the lazy dog"

pattern = "\w+"

match1 = re.fullmatch(pattern, text1)

match2 = re.fullmatch(pattern, text2)

if match1:

print("Matched: ", match1.group())

else:

print("Not matched")

if match2:

print("Matched: ", match2.group())

else:

print("Not matched")

输出结果：

Matched: The

Not matched

9. re.compile()

re.compile()函数用于将正则表达式模式编译成一个正则表达式对象，可以用于多次执行查找或替换操作。

基本语法：

re.compile(pattern, flags=0)

参数说明：

pattern：正则表达式模式

flags：可选标志，用于控制正则表达式的匹配方式，例如是否区分大小写、是否只匹配一次等等。

示例代码：

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = re.compile("\w+")

matches = pattern.findall(text)

print(matches)

输出结果：

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

10. MatchObject对象方法

MatchObject对象是从re.search()、re.findall()、re.finditer()等函数返回的结果，它包含了有关匹配结果的信息，例如匹配字符串、匹配位置