字符串及正则表达式1. 字符串的常用操作 (Master common string operations) 字符串是程

1. 字符串的常用操作 (Master common string operations)

字符串是程序中最常处理的数据类型之一。掌握其基本操作是编程的基石。

核心概念: 字符串是字符的序列，可以像列表一样访问、切片，但它是不可变的，意味着你不能修改字符串中的某个字符，只能通过操作创建一个新的字符串。

常用操作举例:

访问和切片 (Accessing and Slicing):
- s = "hello, world"
- s[0] → 'h' (获取第一个字符)
- s[-1] → 'd' (获取最后一个字符)
- s[0:5]→ 'hello' (切片，从索引0到4)
- s[7:] → 'world' (从索引7到末尾)
连接 (Concatenation):
- s1 = "hello"
- s2 = "world"
- s1 + ", " + s2 → 'hello, world'
查找 (Finding):
- s = "hello, world"
- s.find("world") → 7 (返回子串第一次出现的索引，找不到则返回-1)
- "ll" in s → True (检查子串是否存在)
替换 (Replacing):
- s.replace("world", "Python") → 'hello, Python' (返回一个新字符串)
分割 (Splitting):
- s.split(",") → ['hello', ' world'] (将字符串按分隔符分割成列表)
合并 (Joining):
- parts = ['hello', 'world']
- " ".join(parts) → 'hello world' (用指定字符连接列表中的所有字符串)
大小写转换 (Case Conversion):
- s = "Hello"
- s.upper() → 'HELLO'
- s.lower() → 'hello'
去除空白 (Stripping):
- s = " some text "
- s.strip() → 'some text' (去除两端空白)
- s.lstrip() → 'some text ' (去除左边空白)
- s.rstrip() → ' some text' (去除右边空白)

代码截图：

2. 格式化字符串

主要方法

f-string (推荐, Python 3.6+): 最现代、最直观、性能也最好的方式。

name = "Alice"
age = 30
message = f"My name is {name} and I am {age} years old."
# 输出: "My name is Alice and I am 30 years old."

str.format() 方法: 在f-string出现之前是标准方法，功能非常强大。

name = "Bob"
age = 25
message = "My name is {} and I am {} years old.".format(name, age)
# 或者使用命名参数，更清晰
message = "My name is {n} and I am {a} years old.".format(n=name, a=age)

% 操作符: C语言风格的旧方法，现在已不推荐，但仍需了解以便阅读旧代码。

name = "Charlie"
age = 40
message = "My name is %s and I am %d years old." % (name, age)
# %s 代表字符串, %d 代表整数

format格式控制：

3.字符串的编码和解码

核心问题: 计算机内部只能存储二进制的字节 (bytes) ，而我们人类使用的是字符 (characters) 。编码（Encode）就是将字符转换成字节的过程，解码（Decode）则是将字节转换回字符的过程。

Unicode: 一个国际标准，为世界上几乎所有的字符分配了一个唯一的数字编号（码点）。它是一个抽象的字符集。
UTF-8: 是实现Unicode的最常用的一种编码方案。它是一种可变长度的编码，用1到4个字节来表示一个Unicode字符。ASCII字符只用1个字节，中文字符通常用3个字节。

为什么重要: 在处理文件、网络请求、数据库存储时，如果编码和解码使用的方案不一致，就会产生乱码。

操作示例:

# 一个包含中文字符的字符串 (在Python 3中，字符串默认是Unicode)
s = "你好，世界"

# 编码: str -> bytes
# 将字符串 s 使用 UTF-8 编码方案转换成字节序列  errors三个参数：strict/replace/ignore
utf8_bytes = s.encode('utf-8',errors='replace')
print(f"UTF-8 编码后的字节: {utf8_bytes}")
# 输出: b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c'

gbk_bytes = s.encode('gbk',errors='replace')
print(f"gbk 编码后的字节: {gbk_bytes}")
#输出gbk 编码后的字节: b'\xc4\xe3\xba\xc3\xa3\xac\xca\xc0\xbd\xe7'

# 解码: bytes -> str
# 将字节序列 utf8_bytes 使用 UTF-8 解码方案还原成字符串
decoded_s = utf8_bytes.decode('utf-8')
print(f"UTF-8 解码后的字符串: {decoded_s}")
# 输出: "你好，世界"

# 如果用错误的编码解码，就会出错或乱码
try:
    gbk_decoded_s = utf8_bytes.decode('gbk')
except UnicodeDecodeError as e:
    print(f"使用GBK解码失败: {e}")

4. 数据的验证

数据验证是确保用户输入或外部来源的数据在被程序处理前，符合预期的格式、类型和范围。这是保证程序健壮性和安全性的关键步骤。

验证内容:

存在性 (Presence): 确保数据不为空。
类型 (Type): 确保数据是整数、浮点数、字符串等。
格式 (Format): 确保数据符合特定模式，如电子邮件地址、手机号码、日期格式。
范围 (Range): 确保数值在某个区间内，如年龄在0-120之间。
有效性 (Validity): 如日期 2023-02-30 是无效的。

实现方式:

条件判断: 使用 if/else 结合字符串方法（如 isdigit(), isalpha()）。
异常处理: 使用 try-except 块来捕获转换错误。
正则表达式: 是进行复杂格式验证的利器。

示例：验证用户输入的年龄

user_input = "25" # 模拟用户输入

# 1. 验证是否为数字
if not user_input.isdigit():
    print("错误：请输入数字。")
else:
    # 2. 验证类型 (通过转换)
    age = int(user_input)
    # 3. 验证范围
    if 0 <= age <= 120:
        print(f"验证通过，年龄是: {age}")
    else:
        print("错误：年龄必须在 0 到 120 之间。")