📅 今日知识点
-
核心主题:Pydantic数据模型定义、类型验证、常见避坑
-
适用场景:API参数校验、配置文件解析、数据序列化反序列化、构建健壮的数据处理管道
-
一句话总结:Pydantic是Python最强数据验证库,自动类型检查+数据转换,一行代码搞定数据校验
🧩 核心原理(简化版)
-
Pydantic基于Python类型提示(Type Hints),在运行时自动验证数据类型和结构
-
核心逻辑:定义数据模型类时,使其继承自BaseModel,数据模型类就能自动获得数据验证、序列化、文档生成功能
-
核心价值:替代手动参数校验代码,减少bug,提高开发效率,自动生成API文档
💻 代码实战(可直接复制运行)
1. 环境配置(极简)
# 安装Pydantic(Python 3.8+)
pip install pydantic
# 验证安装
python -c "import pydantic; print(pydantic.VERSION)"
2. 核心用法示例(最常用场景:API参数校验)
from pydantic import BaseModel, Field, validator
from typing import List, Optional
from datetime import datetime
# 1. 定义数据模型(继承BaseModel)
class UserCreate(BaseModel):
# 基本字段定义
username: str = Field(..., min_length=3, max_length=20, description="用户名")
email: str = Field(..., pattern=r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
age: int = Field(..., ge=1, le=120, description="年龄")
# 可选字段
hobbies: Optional[List[str]] = None
is_active: bool = True
# 字段别名(数据库字段映射)
created_at: datetime = Field(default_factory=datetime.now, alias="createdAt")
# 自定义验证器
@field_validator('username')
@classmethod
def username_must_not_contain_spaces(cls, v):
print("field_validator of username_must_not_contain_spaces 正在验证用户名...")
if ' ' in v:
raise ValueError('用户名不能包含空格')
return v.lower() # 自动转换为小写
数据正确的情况:
# 2. 使用模型验证数据,模型数据正确的示例
def valid_sample():
try:
print("\n------------------ 正确数据测试 ------------------")
user_data = {
"username": "john_DOE", # 会被username_must_not_contain_spaces自动转换成小写的john_doe
"email": "john@example.com",
"age": 25,
"hobbies": ["coding", "reading"],
"is_active": True
}
user = UserCreate(**user_data)
print("✅ 数据验证通过:")
print(f"用户名: {user.username}")
print(f"邮箱: {user.email}")
print(f"年龄: {user.age}")
print(f"爱好: {user.hobbies}")
except Exception as e:
print(f"❌ 数据验证失败: {e}")
if __name__ == "__main__":
valid_sample()
运行结果:
------------------ 正确数据测试 ------------------
field_validator of username_must_not_contain_spaces 正在验证用户名...
✅ 数据验证通过:
用户名: john_doe
邮箱: john@example.com
年龄: 25
爱好: ['coding', 'reading']
数据错误的情况:
# 3. 使用模型验证数据,模型数据错误的示例
def invalid_sample():
print("\n------------------ 错误数据测试 ------------------")
try:
invalid_data = {
"username": "ab", # 太短
"email": "invalid-email", # 格式错误
"age": 150, # 超出范围
"hobbies": "not_a_list" # 类型错误
}
UserCreate(**invalid_data)
except Exception as e:
print(f"❌ 捕获到验证错误: {e}")
if __name__ == "__main__":
invalid_sample()
运行结果:
------------------ 错误数据测试 ------------------
❌ 捕获到验证错误: 4 validation errors for UserCreate
username
String should have at least 3 characters [type=string_too_short, input_value='ab', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/string_too_short
email
String should match pattern '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' [type=string_pattern_mismatch, input_value='invalid-email', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/string_pattern_mismatch
age
Input should be less than or equal to 120 [type=less_than_equal, input_value=150, input_type=int]
For further information visit https://errors.pydantic.dev/2.12/v/less_than_equal
hobbies
Input should be a valid list [type=list_type, input_value='not_a_list', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/list_type
模型序列化和反序列化:
# 4. 序列化和反序列化测试
def serialization_sample(user: UserCreate):
print("\n------------------ 序列化测试 ------------------")
print("📖 --- dic列化测试 ---")
user_dict = user.model_dump()
print(f"序列化结果: {user_dict}")
print("--- dict反序列化测试 ---")
deserialized_user = UserCreate.model_validate(user_dict)
print(f"dict反序列化结果: {deserialized_user}")
print("")
print("📜 --- json序列化测试 ---")
user_json = user.model_dump_json()
print(f"JSON格式: {user_json}")
print("--- json反序列化测试 ---")
deserialized_user_json = UserCreate.model_validate_json(user_json)
print(f"json反序列化结果: {deserialized_user_json}")
if __name__ == "__main__":
serialization_sample()
运行结果:
------------------ 序列化测试 ------------------
📖 --- dic列化测试 ---
序列化结果: {'username': 'john_doe', 'email': 'john@example.com', 'age': 25, 'hobbies': None, 'is_active': True, 'created_at': datetime.datetime(2026, 2, 26, 21, 29, 28, 272138)}
--- dict反序列化测试 ---
field_validator of username_must_not_contain_spaces 正在验证用户名...
dict反序列化结果: username='john_doe' email='john@example.com' age=25 hobbies=None is_active=True created_at=datetime.datetime(2026, 2, 26, 21, 29, 28, 272138)
📜 --- json序列化测试 ---
JSON格式: {"username":"john_doe","email":"john@example.com","age":25,"hobbies":null,"is_active":true,"created_at":"2026-02-26T21:29:28.272138"}
--- json反序列化测试 ---
field_validator of username_must_not_contain_spaces 正在验证用户名...
json反序列化结果: username='john_doe' email='john@example.com' age=25 hobbies=None is_active=True created_at=datetime.datetime(2026, 2, 26, 21, 29, 28, 272138)
3. 进阶用法:嵌套模型和环境变量
# 1. 定义嵌套模型
class Address(BaseModel):
street: str
city: str
zipcode: str
class UserProfile(BaseModel):
user_id: int
personal_info: UserCreate # 嵌套使用之前的模型
address: Address
tags: List[str] = []
# 2. 测试嵌套模型
def nested_model_sample():
print("\n📰 ------------------ 嵌套模型测试 ------------------")
profile_data = {
"user_id": 123,
"personal_info": {
"username": "alice",
"email": "alice@example.com",
"age": 30
},
"address": {
"street": "123 Main St",
"city": "Beijing",
"zipcode": "100000"
},
"tags": ["developer", "python"]
}
profile = UserProfile(**profile_data)
print(f"用户ID: {profile.user_id}")
print(f"城市: {profile.address.city}")
print(f"标签: {profile.tags}")
if __name__ == "__main__":
nested_model_sample()
运行结果:
📰 ------------------ 嵌套模型测试 ------------------
field_validator of username_must_not_contain_spaces 正在验证用户名...
用户ID: 123
城市: Beijing
标签: ['developer', 'python']
4. 核心要点(4个,记牢即可)
# 1. 模型定义:继承BaseModel,使用类型提示
class MyModel(BaseModel):
field: str # 必填字段
optional_field: Optional[str] = None # 可选字段
# 2. 字段约束:Field函数设置验证规则
field: str = Field(..., min_length=1, max_length=100)
# 3. 自定义验证:@field_validator装饰器
@field_validator('field_name')
def validate_field(cls, v):
if some_condition:
raise ValueError("验证失败")
return v
# 4. 数据转换:自动类型转换和序列化
model = MyModel(**data) # 创建对象
dict_data = model.model_dump() # 序列化为字典
json_data = model.model_dump_json() # 序列化为JSON
obj_from_dict = MyModel.model_validate(dict_data)# 从字典反序列化
obj_from_json = MyModel.model_validate_json(json_data)# 从json反序列化
⚠️ 避坑指南(简洁重点)
1. 核心坑1:可变对象默认值陷阱
# ❌ 错误写法(所有实例共享同一个列表)
class BadModel(BaseModel):
items: List[str] = [] # 危险!默认值会被共享
# ✅ 正确写法
class GoodModel(BaseModel):
items: List[str] = Field(default_factory=list) # 每次创建新实例
2. 核心坑2:循环导入问题
# ❌ 可能导致循环导入
from typing import ForwardRef
# ✅ 使用字符串引用解决
class Node(BaseModel):
children: List["Node"] = [] # 字符串引用
3. 核心坑3:datetime序列化时区问题
from datetime import datetime
import pytz
# ❌ 时区信息丢失
dt_naive = datetime.now() # naive datetime
# ✅ 保留时区信息
dt_aware = datetime.now(pytz.UTC) # timezone-aware datetime
✅ 今日总结
-
Pydantic核心是"声明式数据验证",通过类型提示自动完成数据校验和转换
-
关键要点:BaseModel继承、Field约束、field_validator验证器、自动序列化
-
实用性强:API开发、配置管理、数据清洗等场景都能大幅提升开发效率
-
生态完善:与FastAPI深度集成,是现代Python开发不可或缺的工具