Python编程实战 - Python实用工具与库 - 正则表达式匹配（re 模块）在 Python 中，我们使用标准库

正则表达式（Regular Expression，简称 Regex）是用于字符串模式匹配和文本处理的强大工具。在 Python 中，我们使用标准库 re 来执行匹配、搜索、替换等操作。

一、re 模块简介

导入方式：

import re

常见功能：

匹配字符串：判断是否符合某种模式
提取内容：从文本中抽取特定信息
替换内容：按规则替换文本
分割字符串：按正则规则拆分字符串

二、基本匹配方法

1. `re.match()` —— 从字符串起始位置匹配

import re

result = re.match(r"Hello", "Hello Python")
if result:
    print("匹配成功:", result.group())

输出：

匹配成功: Hello

说明：re.match 只匹配字符串开头部分。

2. `re.search()` —— 搜索整个字符串

result = re.search(r"Python", "Hello Python World")
print(result.group())

输出：

Python

说明：search 会从整个字符串中查找第一个匹配项。

3. `re.findall()` —— 返回所有匹配结果列表

text = "电话：12345，传真：67890"
numbers = re.findall(r"\d+", text)
print(numbers)

输出：

['12345', '67890']

4. `re.sub()` —— 替换匹配的内容

text = "我喜欢Java，也喜欢JavaScript"
new_text = re.sub(r"Java", "Python", text)
print(new_text)

输出：

我喜欢Python，也喜欢PythonScript

5. `re.split()` —— 按模式分割字符串

text = "苹果,香蕉;橘子|葡萄"
fruits = re.split(r"[,;|]", text)
print(fruits)

输出：

['苹果', '香蕉', '橘子', '葡萄']

三、正则表达式常用符号表

符号	含义	示例
`.`	任意一个字符（除换行）	`a.c` → 匹配 `abc`, `a9c`
`\d`	数字 [0-9]	`\d+` → 匹配一个或多个数字
`\w`	单词字符（字母/数字/下划线）	`\w+` → 匹配单词
`\s`	空白字符（空格、制表符等）	`\s+`
`^`	匹配字符串开始	`^Hello`
`$`	匹配字符串结尾	`world$`
`[]`	字符集	`[abc]` → 匹配 a、b 或 c
`[^]`	非字符集	`[^0-9]` → 非数字
`*`	重复零次或多次	`a*`
`+`	重复一次或多次	`a+`
`?`	重复零次或一次	`a?`
`{m,n}`	重复 m 到 n 次	`\d{3,5}` → 匹配 3~5 位数字
`()`	分组	`(ab)+`
`	`	或关系	`cat	dog`

四、分组与提取

1. 提取邮箱地址示例

text = "请联系我：email1@test.com 或 email2@sample.org"
emails = re.findall(r"[\w.-]+@[\w.-]+\.\w+", text)
print(emails)

输出：

['email1@test.com', 'email2@sample.org']

2. 使用括号捕获分组

text = "价格：¥99.5"
match = re.search(r"¥(\d+\.\d+)", text)
print("提取价格：", match.group(1))

输出：

提取价格： 99.5

3. 命名分组

text = "日期：2025-11-11"
match = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text)
print(match.groupdict())

输出：

{'year': '2025', 'month': '11', 'day': '11'}

五、实战案例：提取网页链接与标题

import re

html = """
<a href="https://example.com/page1">第一页</a>
<a href="https://example.com/page2">第二页</a>
"""

pattern = r'<a href="(https?://[^"]+)">([^<]+)</a>'
matches = re.findall(pattern, html)

for url, title in matches:
    print(f"标题：{title}，链接：{url}")

输出：

标题：第一页，链接：https://example.com/page1
标题：第二页，链接：https://example.com/page2

六、提升性能的小技巧

使用原始字符串：避免转义混乱，如 r"\d+"。

编译正则表达式（多次使用时提升性能）：

pattern = re.compile(r"\d+")
print(pattern.findall("123abc456"))

懒惰匹配（最短匹配）： *?, +?, {m,n}?

示例：

text = "<p>Python</p><p>Regex</p>"
print(re.findall(r"<p>.*?</p>", text))

输出：

['<p>Python</p>', '<p>Regex</p>']

七、总结

功能	方法	说明
起始匹配	`re.match()`	从字符串开头匹配
搜索匹配	`re.search()`	搜索第一个匹配
全部匹配	`re.findall()`	返回所有结果列表
替换	`re.sub()`	替换匹配内容
拆分	`re.split()`	按模式拆分字符串
编译优化	`re.compile()`	预编译正则表达式

Python编程实战 - Python实用工具与库 - 正则表达式匹配（re 模块）

一、re 模块简介

二、基本匹配方法

1. re.match() —— 从字符串起始位置匹配

2. re.search() —— 搜索整个字符串

3. re.findall() —— 返回所有匹配结果列表

4. re.sub() —— 替换匹配的内容

5. re.split() —— 按模式分割字符串

三、正则表达式常用符号表

四、分组与提取

1. 提取邮箱地址示例

2. 使用括号捕获分组

3. 命名分组

五、实战案例：提取网页链接与标题

六、提升性能的小技巧

七、总结

1. `re.match()` —— 从字符串起始位置匹配

2. `re.search()` —— 搜索整个字符串

3. `re.findall()` —— 返回所有匹配结果列表

4. `re.sub()` —— 替换匹配的内容

5. `re.split()` —— 按模式分割字符串