python从入门到精通-第12章: 工业级实践 — 从脚本到工程

4 阅读18分钟

第12章: 工业级实践 — 从脚本到工程

Java/Kotlin 开发者习惯了 Maven/Gradle 的标准化工程体系:约定目录结构、声明式依赖、生命周期管理、插件生态。Python 生态长期碎片化,但 2020 年代后逐渐收敛到 pyproject.toml + 现代工具链。本章从项目结构出发,覆盖依赖管理、代码质量、测试策略、CI/CD、安全实践和可观测性,帮你把 Python 从"写脚本"提升到"工程交付"。


12.1 项目结构

Java/Kotlin 对比

// Maven 标准目录结构(Gradle 也遵循同样约定)
my-project/
├── pom.xml                          // 唯一构建配置
├── src/
│   ├── main/
│   │   ├── java/com/example/app/    // 源码
│   │   └── resources/               // 资源文件
│   └── test/
│       ├── java/com/example/app/    // 测试代码
│       └── resources/               // 测试资源
├── .mvn/                            // Maven Wrapper
└── mvnw / mvnw.cmd                  // Wrapper 脚本
// Gradle Kotlin DSL 项目
my-project/
├── build.gradle.kts                 // 构建脚本
├── settings.gradle.kts              // 项目设置
├── src/
│   ├── main/kotlin/com/example/     // 源码
│   ├── main/resources/
│   ├── test/kotlin/com/example/     // 测试
│   └── test/resources/
└── gradle/
    └── wrapper/                     // Gradle Wrapper

Python 实现

# === 推荐: src layout(现代 Python 项目标准) ===
#
# my-project/
# ├── pyproject.toml          # 唯一项目配置(替代 setup.py/setup.cfg)
# ├── src/
# │   └── my_package/         # 实际包名
# │       ├── __init__.py
# │       ├── __main__.py     # python -m 入口
# │       ├── core.py
# │       └── utils.py
# ├── tests/
# │   ├── conftest.py         # pytest 共享 fixtures
# │   ├── test_core.py
# │   └── test_utils.py
# ├── docs/
# ├── .pre-commit-config.yaml
# ├── .github/
# │   └── workflows/
# │       └── ci.yml
# └── README.md

# === 不推荐: flat layout ===
#
# my-project/
# ├── pyproject.toml
# ├── my_package/             # 直接放在根目录
# │   ├── __init__.py
# │   └── core.py
# └── tests/
#
# 问题: 当你在项目根目录运行 Python 时,
# 当前目录会被加入 sys.path,导致 import my_package
# 找到的是本地未安装的代码,而非 pip install 后的版本
# 这会掩盖打包错误

# === __main__.py 入口 ===
# src/my_package/__main__.py
import sys
from my_package.core import main

if __name__ == "__main__":
    # python -m my_package 会执行这个文件
    sys.exit(main())
# === pyproject.toml 完整配置 ===
# 这是现代 Python 项目的"pom.xml",所有工具共享一份配置

> pyproject.toml 的完整字段说明详见 [1.4 pyproject.toml](01-environment-tooling.md),本节聚焦工业级项目的配置实践。

# pyproject.toml
"""
[build-system]
# 构建后端: 用什么工具把源码打包
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "my-package"
version = "1.0.0"
description = "A production-ready Python package"
readme = "README.md"
requires-python = ">=3.10"
license = {text = "MIT"}
authors = [
    {name = "Developer", email = "dev@example.com"},
]
# 依赖 — 相当于 Maven 的 <dependencies>
dependencies = [
    "pydantic>=2.0",
    "httpx>=0.25",
    "structlog>=23.0",
]

# 可选依赖分组 — 相当于 Maven 的 <scope>
[project.optional-dependencies]
dev = [
    "ruff>=0.4",
    "mypy>=1.8",
    "pre-commit>=3.6",
]
test = [
    "pytest>=8.0",
    "pytest-asyncio>=0.23",
    "pytest-cov>=5.0",
    "hypothesis>=6.100",
]
prod = [
    "gunicorn>=22.0",
    "uvicorn[standard]>=0.30",
]

# 入口点 — 相当于 Maven 的 mainClass
[project.scripts]
my-cli = "my_package.core:main"

# 工具配置区域 — 所有工具共享这一个文件
[tool.ruff]
target-version = "py310"
line-length = 100

[tool.mypy]
python_version = "3.10"
strict = true

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
"""
# === Monorepo 管理 ===
# Python 没有 Maven multi-module 或 Gradle composite build 的原生支持
# 但可以用 workspace 工具管理

# 方案 1: uv workspace(推荐)
# pyproject.toml
"""
[project]
name = "my-monorepo"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = []

[tool.uv.workspace]
members = ["packages/*"]

[tool.uv.sources]
# 包间依赖直接引用本地路径
core = { workspace = true }
api = { workspace = true }
"""

# packages/core/pyproject.toml
"""
[project]
name = "core"
version = "0.1.0"
dependencies = ["pydantic>=2.0"]
"""

# packages/api/pyproject.toml
"""
[project]
name = "api"
version = "0.1.0"
dependencies = ["core", "fastapi>=0.110"]
"""

# 方案 2: pip install -e(开发模式,适合简单 monorepo)
# pip install -e ./packages/core
# pip install -e ./packages/api
# 修改源码后无需重新安装

核心差异

维度Maven/GradlePython (pyproject.toml)
配置文件pom.xml / build.gradle.ktspyproject.toml(统一)
目录约定强制 src/main/java推荐 src layout,不强制
包管理单一(Maven Central)PyPI + 私有仓库
多模块原生支持uv workspace / pip -e
构建生命周期compile → test → package无标准生命周期,按工具各自运行
入口mainClass / public static void__main__.py + [project.scripts]

常见陷阱

# 陷阱 1: flat layout 导致的导入歧义
# 项目结构:
# my-project/
# ├── my_package/__init__.py
# └── tests/test_something.py

# 在 my-project/ 目录下运行 pytest:
# import my_package  # 找到的是本地未安装的代码!
# 但 CI 中 pip install 后运行,找到的是安装后的代码
# 两者行为可能不同(比如缺少 __pycache__、资源文件路径不同)

# 解决: 用 src layout + pip install -e .

# 陷阱 2: python my_package/main.py vs python -m my_package
# 前者: my_package 不在 sys.path,相对导入会失败
# 后者: 正确设置 sys.path,__package__ 变量正确
# 永远用 python -m 运行包

# 陷阱 3: __init__.py 放太多代码
# Java 的包没有"初始化"概念,Python 的 __init__.py 有
# 不要在 __init__.py 中放业务逻辑,只做公开 API 的 re-export

何时使用

  • src layout: 所有可安装的包(99% 的情况)
  • flat layout: 仅限纯脚本项目(不打包、不发布)
  • monorepo: 多个紧密耦合的包共享开发

12.2 依赖管理

Java/Kotlin 对比

<!-- Maven: 声明式依赖,GAV 坐标 -->
<dependencies>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.17.0</version>
    </dependency>
    <!-- scope 控制依赖范围 -->
    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter</artifactId>
        <version>5.10.2</version>
        <scope>test</scope>
    </dependency>
</dependencies>
<!-- Maven 依赖传递是自动的:引入 A,A 依赖 B,B 自动引入 -->
// Gradle Kotlin DSL
dependencies {
    // implementation: 编译时需要,运行时需要,不传递
    implementation("com.fasterxml.jackson.module:jackson-module-kotlin:2.17.0")
    // testImplementation: 仅测试
    testImplementation("org.junit.jupiter:junit-jupiter:5.10.2")
    // api: 编译时需要,运行时需要,且传递给下游
    api("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.8.0")
}
// Gradle 版本目录: libs.versions.toml 统一管理版本

Python 实现

# === 方案 1: uv(推荐,2024 年后最快崛起的工具) ===
# Rust 编写,极快,兼容 pip 接口

# 初始化项目
# $ uv init my-project
# $ cd my-project

# 添加依赖(自动写入 pyproject.toml + uv.lock)
# $ uv add pydantic httpx
# $ uv add --dev pytest ruff mypy     # 开发依赖

# 安装
# $ uv sync                           # 根据 uv.lock 安装

# 运行
# $ uv run pytest                     # 在虚拟环境中运行
# $ uv run python -m my_package       # 在虚拟环境中运行

# pyproject.toml 中 uv 自动生成的部分:
"""
[project]
dependencies = [
    "pydantic>=2.0,<3",
    "httpx>=0.25,<1",
]

[dependency-groups]
dev = [
    "pytest>=8.0,<9",
    "ruff>=0.4,<1",
]
"""

# === 方案 2: Poetry(成熟稳定) ===
# $ poetry init
# $ poetry add pydantic httpx
# $ poetry add --group dev pytest ruff
# $ poetry install
# $ poetry run pytest

# pyproject.toml:
"""
[tool.poetry.dependencies]
python = "^3.10"
pydantic = "^2.0"
httpx = "^0.25"

[tool.poetry.group.dev.dependencies]
pytest = "^8.0"
ruff = "^0.4"

[tool.poetry.group.test.dependencies]
pytest-cov = "^5.0"
hypothesis = "^6.100"
"""

# === 方案 3: pip-tools(最轻量) ===
# requirements.txt.in(声明依赖)
# requirements.txt(锁定版本,由 pip-compile 生成)
# dev-requirements.txt.in
# dev-requirements.txt

# $ pip-compile requirements.txt.in        # 生成锁定文件
# $ pip-compile dev-requirements.txt.in
# $ pip install -r requirements.txt
# $ pip install -r dev-requirements.txt
# === 锁文件对比 ===

# poetry.lock: TOML 格式,内容哈希校验
# [[package]]
# name = "pydantic"
# version = "2.7.0"
# ...

# uv.lock: TOML 格式,更紧凑
# [[package]]
# name = "pydantic"
# version = "2.7.0"
# source = { registry = "https://pypi.org/simple" }
# ...

# 两者作用相同: 锁定所有直接+间接依赖的精确版本
# 相当于 Maven 的 dependency:tree 被"冻结"

# === 私有仓库配置 ===
# pyproject.toml
"""
[[tool.uv.source]]
name = "private"
url = "https://pypi.example.com/simple/"
# 如果需要认证,用环境变量或 netrc 文件

# poetry 配置私有仓库
[tool.poetry.source]
name = "private"
url = "https://pypi.example.com/simple/"
priority = "supplemental"
"""

# netrc 文件 (~/.netrc) — 认证信息
# machine pypi.example.com
# login my-username
# password my-token

# 环境变量方式(推荐 CI 使用)
# UV_INDEX_URL=https://user:token@pypi.example.com/simple/

核心差异

维度Maven/GradlePython (uv/Poetry)
依赖声明XML/Kotlin DSLpyproject.toml (TOML)
版本锁定无(依赖传递解析)锁文件 (uv.lock/poetry.lock)
传递依赖自动(BFS 解析)自动(pip/uv 解析)
依赖范围compile/test/runtime/provideddependencies / [dependency-groups]
依赖冲突最近优先(Maven)/ 严格(Gradle)最小版本兼容
私有仓库settings.xml / repositories {}[[tool.uv.source]] / netrc

常见陷阱

# 陷阱 1: 不用锁文件
# pip install pydantic  # 只锁定直接依赖,不锁定传递依赖
# 下次安装可能得到不同的传递依赖版本 → 构建不可复现
# 解决: 永远用 uv sync / poetry install(自动使用锁文件)

# 陷阱 2: requirements.txt 没有哈希
# pip install -r requirements.txt  # 不校验包完整性
# 解决: pip install --require-hashes -r requirements.txt
# 或用 uv/poetry(自动校验)

# 陷阱 3: Python 没有"provided" scope
# Java 的 provided scope(如 servlet-api)表示运行时由容器提供
# Python 没有等价物。如果依赖是可选的,用 try/except import:
try:
    import boto3  # type: ignore
    HAS_BOTO3 = True
except ImportError:
    HAS_BOTO3 = False

何时使用

  • uv: 新项目首选,速度极快,兼容 pip 生态
  • Poetry: 已有项目、需要插件生态
  • pip-tools: 最简单场景、Docker 镜像构建
  • 锁文件: 生产环境必须使用

12.3 代码质量

Java/Kotlin 对比

<!-- Maven: Checkstyle + SpotBugs + PMD -->
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-checkstyle-plugin</artifactId>
    <version>3.3.1</version>
    <configuration>
        <configLocation>google_checks.xml</configLocation>
    </configuration>
    <executions>
        <execution>
            <goals><goal>check</goal></goals>
            <phase>verify</phase>
        </execution>
    </executions>
</plugin>
// Gradle: ktlint + detekt
// build.gradle.kts
plugins {
    id("org.jlleitschuh.gradle.ktlint") version "12.1.0"
    id("io.gitlab.arturbosch.detekt") version "1.23.6"
}

ktlint {
    android.set(false)
    outputColorName.set("RED")
}

Python 实现

# === Ruff: 一个工具替代 Checkstyle + SpotBugs + ktlint + Google Java Format ===
# Rust 编写,极快(比 flake8 快 10-100 倍)
# 集成 linter + formatter + import sorting + 自动修复

# pyproject.toml 中的 Ruff 配置:
"""
[tool.ruff]
# 目标 Python 版本
target-version = "py310"
# 行长度(Black 默认 88,我们用 100)
line-length = 100
# 排除目录
exclude = [".git", ".venv", "node_modules", "*.egg-info"]

# Linter 规则
[tool.ruff.lint]
# 启用的规则集
select = [
    "E",    # pycodestyle errors
    "W",    # pycodestyle warnings
    "F",    # pyflakes
    "I",    # isort(import 排序)
    "N",    # pep8-naming
    "UP",   # pyupgrade(自动升级语法到目标版本)
    "B",    # flake8-bugbear(常见 bug 模式)
    "SIM",  # flake8-simplify(简化建议)
    "C4",   # flake8-comprehensions(更好的推导式)
    "RUF",  # ruff 特有规则
]
# 忽略的规则
ignore = [
    "E501",  # 行太长(交给 formatter 处理)
]

# 每个文件允许的未使用 import
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["F401"]  # __init__.py 中未使用的 import 是正常的
"tests/*" = ["S101"]       # 测试中允许 assert

# isort 配置
[tool.ruff.lint.isort]
known-first-party = ["my_package"]

# Formatter 配置
[tool.ruff.format]
quote-style = "double"
indent-style = "space"
"""

# === 运行 Ruff ===
# $ ruff check .                    # 检查
# $ ruff check --fix .              # 自动修复
# $ ruff format .                   # 格式化(替代 Black)
# $ ruff check --fix . && ruff format .  # 一键修复+格式化
# === mypy: 静态类型检查(相当于 Java 编译器的类型检查) ===
# Java 在编译期就检查类型,Python 默认不检查
# mypy 是最成熟的 Python 类型检查器

# pyproject.toml 中的 mypy 配置:
"""
[tool.mypy]
python_version = "3.10"
# strict 模式: 开启所有严格检查
strict = true
# 额外配置
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_any_generics = true
# 第三方库的类型存根
[[tool.mypy.overrides]]
module = ["httpx.*", "structlog.*"]
ignore_missing_imports = false
"""

# === pre-commit hooks: 提交前自动检查 ===
# 相当于 Maven 的 verify phase 在 commit 时自动运行
# .pre-commit-config.yaml:
"""
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.4.4
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
        args: [--maxkb=500]

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.10.0
    hooks:
      - id: mypy
        additional_dependencies: [pydantic]
        entry: mypy src/
"""

# 安装 pre-commit hooks:
# $ pre-commit install
# 之后每次 git commit 自动运行上述检查
# === 具体示例: Ruff 能捕获的问题 ===

# 问题 1: 可变默认参数(Python 经典陷阱)
def append_to(element, target=[]):  # RUFF: B006 mutable-argument-default
    target.append(element)
    return target

# 修复
def append_to(element, target: list | None = None):
    if target is None:
        target = []
    target.append(element)
    return target


# 问题 2: 未使用的循环变量
for _ in range(10):  # 正确: 用 _ 表示忽略
    pass

for i in range(10):  # RUFF: F841 unused-variable
    pass


# 问题 3: 冗余推导式
items = [1, 2, 3]
result = [x for x in items]  # RUFF: C416 unnecessary-comprehension
# 修复
result = list(items)


# 问题 4: X 存在则 X(简化为直接调用)
import os
if os.path.exists(path):  # RUFF: SIM115 open-file-with-context-handler
    f = open(path)  # 应该用 with 语句

# 修复
with open(path) as f:
    ...

核心差异

维度Java/KotlinPython (Ruff + mypy)
LinterCheckstyle / ktlint / detektRuff(一个工具替代全部)
FormatterGoogle Java Format / ktlintRuff format(替代 Black)
类型检查编译器内置mypy(需额外运行)
静态分析SpotBugs / Error ProneRuff bugbear rules
自动修复IDE 辅助Ruff --fix(命令行自动修复)
提交检查Maven verify phasepre-commit hooks
速度慢(JVM 启动)极快(Rust 原生)

常见陷阱

# 陷阱 1: 只用 mypy 不用 Ruff
# mypy 只检查类型,不检查代码风格和常见 bug
# Ruff 检查风格和 bug,不检查类型
# 两者互补,必须同时使用

# 陷阱 2: mypy strict 对第三方库报错
# 很多第三方库没有类型注解
# 解决: 用 [[tool.mypy.overrides]] 为特定库放宽检查

# 陷阱 3: pre-commit 和 CI 检查不一致
# pre-commit 可能用了不同版本的 Ruff/mypy
# 解决: pre-commit-config.yaml 中指定版本,CI 中用相同版本

何时使用

  • Ruff: 所有 Python 项目,替代 flake8 + Black + isort + pyupgrade
  • mypy strict: 生产项目,特别是多人协作
  • pre-commit: 所有项目,防止不合规代码进入仓库

12.4 测试策略

Java/Kotlin 对比

// JUnit 5: 注解驱动测试
import static org.junit.jupiter.api.Assertions.*;
import org.junit.jupiter.api.*;
import org.junit.jupiter.params.*;
import org.junit.jupiter.params.provider.*;

class CalculatorTest {
    private Calculator calc;

    @BeforeEach
    void setUp() {
        calc = new Calculator();
    }

    @Test
    void testAdd() {
        assertEquals(5, calc.add(2, 3));
    }

    @ParameterizedTest
    @CsvSource({"2,3,5", "0,0,0", "-1,1,0"})
    void testAddParams(int a, int b, int expected) {
        assertEquals(expected, calc.add(a, b));
    }

    @Test
    @Disabled("TODO: fix later")
    void testDisabled() { }
}

// Mockito: Mock 依赖
import static org.mockito.Mockito.*;

class ServiceTest {
    @Test
    void testWithMock() {
        Repository repo = mock(Repository.class);
        when(repo.findById(1L)).thenReturn(Optional.of(new User("Alice")));

        Service service = new Service(repo);
        User result = service.getUser(1L);

        assertEquals("Alice", result.getName());
        verify(repo).findById(1L);
    }
}
// Kotlin-test + MockK
class CalculatorTest {
    private lateinit var calc: Calculator

    @BeforeEach
    fun setUp() {
        calc = Calculator()
    }

    @Test
    fun `add two numbers`() {
        assertEquals(5, calc.add(2, 3))
    }

    @Test
    fun `mock repository`() {
        val repo = mockk<Repository>()
        every { repo.findById(1L) } returns User("Alice")

        val service = Service(repo)
        assertEquals("Alice", service.getUser(1L).name)

        verify { repo.findById(1L) }
    }
}

Python 实现

# === pytest: fixtures, parametrize, markers ===
# tests/conftest.py — 共享 fixtures(相当于 JUnit 的 @BeforeEach)
import pytest
from my_package.core import Calculator, UserService, UserRepository


# fixture: 相当于 @BeforeEach + @BeforeAll 的灵活版本
@pytest.fixture
def calculator():
    """每个测试函数获得独立的 Calculator 实例"""
    return Calculator()


@pytest.fixture
def user_service():
    """注入 mock repository 的 service"""
    repo = UserRepository()
    service = UserService(repo)
    yield service
    # yield 后是清理逻辑(相当于 @AfterEach)
    service.shutdown()


# fixture 作用域
@pytest.fixture(scope="session")
def db_connection():
    """整个测试 session 共享一个数据库连接"""
    conn = create_connection()
    yield conn
    conn.close()


@pytest.fixture(scope="module")
def shared_cache():
    """同一个测试文件内共享"""
    return {}


# tests/test_calculator.py
import pytest


class TestCalculator:
    """测试类: 纯粹的组织手段,不需要继承任何基类"""

    def test_add(self, calculator):
        # assert 是 Python 内置关键字,不需要静态导入
        assert calculator.add(2, 3) == 5
        assert calculator.add(-1, 1) == 0

    def test_divide_by_zero(self, calculator):
        # pytest.raises: 相当于 assertThrows
        with pytest.raises(ZeroDivisionError, match="division by zero"):
            calculator.divide(1, 0)

    def test_add_negative(self, calculator):
        assert calculator.add(-5, -3) == -8


# 参数化测试: 相当于 @ParameterizedTest + @CsvSource
@pytest.mark.parametrize(
    "a, b, expected",
    [
        (2, 3, 5),
        (0, 0, 0),
        (-1, 1, 0),
        (100, 200, 300),
    ],
)
def test_add_parametrized(calculator, a, b, expected):
    assert calculator.add(a, b) == expected


# 组合参数化
@pytest.mark.parametrize("x", [1, 2])
@pytest.mark.parametrize("y", [10, 20])
def test_multiply_matrix(calculator, x, y):
    assert calculator.multiply(x, y) == x * y
    # 会生成 4 个测试: (1,10), (1,20), (2,10), (2,20)


# markers: 分类测试
@pytest.mark.slow
def test_large_computation(calculator):
    import time
    time.sleep(2)
    assert calculator.fibonacci(100) > 0


@pytest.mark.integration
def test_database_query(user_service):
    result = user_service.find_by_name("Alice")
    assert result is not None


# pyproject.toml 中注册 markers:
"""
[tool.pytest.ini_options]
markers = [
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "integration: marks tests as integration tests",
]
testpaths = ["tests"]
asyncio_mode = "auto"
"""

# 运行:
# $ pytest                           # 运行所有测试
# $ pytest tests/test_calculator.py  # 运行单个文件
# $ pytest -k "test_add"             # 按名称过滤
# $ pytest -m "not slow"             # 排除 slow 测试
# $ pytest -x                        # 第一个失败就停止
# $ pytest --cov=my_package          # 覆盖率
# $ pytest -n auto                   # 并行执行(pytest-xdist)
pytest fixture 深入: 依赖注入与作用域
import pytest
from dataclasses import dataclass

# === fixture 作用域 ===
# scope 控制 fixture 的生命周期: function(默认) < class < module < package < session

@pytest.fixture(scope="session")
def db_connection():
    """整个测试会话只创建一次(如数据库连接)"""
    print("\n[setup] 创建数据库连接")
    conn = {"connected": True}
    yield conn
    print("\n[teardown] 关闭数据库连接")
    conn["connected"] = False

@pytest.fixture(scope="function")
def clean_db(db_connection):
    """每个测试函数执行前清理数据(依赖 db_connection)"""
    db_connection["data"] = []
    return db_connection

# === fixture 依赖注入 ===
@pytest.fixture
def user_client(clean_db):
    """fixture 可以依赖其他 fixture"""
    def create_user(name):
        clean_db["data"].append(name)
        return {"name": name, "id": len(clean_db["data"])}
    return create_user

def test_create_user(user_client):
    u1 = user_client("Alice")
    u2 = user_client("Bob")
    assert u1["id"] == 1
    assert u2["id"] == 2

# === conftest.py: 共享 fixture ===
# tests/conftest.py 中的 fixture 自动对所有测试可用
# 不需要 import,pytest 自动发现

# === 参数化 fixture ===
@pytest.fixture(params=["sqlite", "postgres", "mysql"])
def db_engine(request):
    return request.param

def test_all_engines(db_engine):
    assert isinstance(db_engine, str)
# === mock/patch: unittest.mock ===
# 相当于 Mockito / MockK
from unittest.mock import Mock, patch, MagicMock, call
import pytest


# 方式 1: 直接创建 Mock
def test_mock_basic():
    mock_repo = Mock()
    mock_repo.find_by_id.return_value = {"name": "Alice"}
    mock_repo.find_by_id.side_effect = lambda x: {"name": f"User-{x}"}

    # 调用
    result = mock_repo.find_by_id(1)
    assert result == {"name": "User-1"}

    # 验证调用
    mock_repo.find_by_id.assert_called_once_with(1)
    mock_repo.find_by_id.assert_has_calls([call(1)])


# 方式 2: patch — 替换模块中的对象(最常用)
# 相当于 MockK 的 every { ... } returns ... 模式
def test_with_patch():
    # patch 替换 my_package.core.requests.get
    with patch("my_package.core.requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = {"data": "hello"}

        from my_package.core import fetch_data
        result = fetch_data("https://api.example.com")

        assert result == {"data": "hello"}
        mock_get.assert_called_once_with("https://api.example.com")


# 方式 3: patch 作为 decorator
@patch("my_package.core.requests.get")
def test_with_decorator(mock_get):
    mock_get.return_value.status_code = 200
    mock_get.return_value.json.return_value = {"data": "hello"}

    from my_package.core import fetch_data
    result = fetch_data("https://api.example.com")

    assert result == {"data": "hello"}


# 方式 4: patch 作为 fixture(推荐)
@pytest.fixture
def mock_http():
    with patch("my_package.core.requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = {"data": "hello"}
        yield mock_get


def test_with_fixture(mock_http):
    from my_package.core import fetch_data
    result = fetch_data("https://api.example.com")
    assert result == {"data": "hello"}


# side_effect: 模拟异常
def test_mock_exception():
    mock_repo = Mock()
    mock_repo.find.side_effect = ValueError("not found")

    with pytest.raises(ValueError, match="not found"):
        mock_repo.find(1)


# Async mock
import asyncio

async def test_async_mock():
    mock_client = Mock()
    mock_client.fetch = AsyncMock(return_value={"status": "ok"})

    result = await mock_client.fetch("/api")
    assert result == {"status": "ok"}
# === pytest-asyncio: 异步测试 ===
# 相当于 JUnit 5 无原生支持(需要第三方库)
import asyncio
import pytest


@pytest.fixture
async def async_client():
    """异步 fixture"""
    client = await create_client()
    yield client
    await client.close()


@pytest.mark.asyncio
async def test_async_operation(async_client):
    """异步测试函数"""
    result = await async_client.fetch("/api/users")
    assert len(result) > 0


@pytest.mark.asyncio
async def test_concurrent_requests(async_client):
    """并发请求测试"""
    tasks = [async_client.fetch(f"/api/users/{i}") for i in range(10)]
    results = await asyncio.gather(*tasks)
    assert len(results) == 10
# === pytest-cov: 覆盖率 ===
# 相当于 JaCoCo

# 运行:
# $ pytest --cov=my_package --cov-report=term-missing --cov-report=html
# 输出:
# my_package/core.py     85%    12    2
# my_package/utils.py    100%   0     0
# TOTAL                  92%    12    2

# pyproject.toml 配置:
"""
[tool.coverage.run]
source = ["my_package"]
branch = true

[tool.coverage.report]
fail_under = 80
show_missing = true
exclude_lines = [
    "pragma: no cover",
    "if TYPE_CHECKING:",
    "raise NotImplementedError",
    "if __name__ == .__main__.:",
]
"""
# === Hypothesis: 属性测试(Property-Based Testing) ===
# 相当于 jqwik / ScalaCheck
# 不是给定输入断言输出,而是描述属性,Hypothesis 自动生成输入

from hypothesis import given, strategies as st, settings


# 策略: 定义输入数据的生成规则
# st.integers() 相当于 Arbitrary<Integer>
@given(st.integers(), st.integers())
def test_add_commutative(a, b):
    """加法交换律: a + b == b + a,对所有整数成立"""
    assert a + b == b + a


@given(st.lists(st.integers()))
def test_sort_idempotent(lst):
    """排序是幂等的: 排序两次结果相同"""
    assert sorted(sorted(lst)) == sorted(lst)


@given(st.text())
def test_reverse_twice(s):
    """反转两次等于原字符串"""
    assert s == s[::-1][::-1]


@given(st.integers(min_value=0))
@settings(max_examples=200)  # 增加测试用例数
def test_fibonacci_monotonic(n):
    """斐波那契数列单调递增"""
    from my_package.core import fibonacci
    if n > 1:
        assert fibonacci(n) > fibonacci(n - 1)


# 自定义策略
from hypothesis import strategies as st

UserStrategy = st.builds(
    dict,
    name=st.text(min_size=1, max_size=50),
    age=st.integers(min_value=0, max_value=150),
    email=st.emails(),
)

@given(UserStrategy)
def test_user_validation(user):
    """所有生成的用户数据都能通过 Pydantic 验证"""
    from my_package.models import User
    validated = User(**user)
    assert validated.name == user["name"]

核心差异

维度JUnit 5 / Kotlin-testpytest
测试发现注解 @Test自动发现 test_ 前缀函数
断言assertEquals/assertThrowsassert + pytest.raises
生命周期@Before/@Afterfixtures(更灵活)
参数化@ParameterizedTest@pytest.mark.parametrize
MockMockito / MockKunittest.mock
异步测试无原生支持pytest-asyncio
属性测试jqwikHypothesis
覆盖率JaCoCopytest-cov
测试基类需要继承不需要继承,纯函数

常见陷阱

# 陷阱 1: patch 的路径错误
# 错误: patch("my_package.core.MyClass")  # 替换类定义
# 正确: patch("my_package.core.requests.get")  # 替换使用处的引用
# 原则: patch WHERE IT'S USED, NOT WHERE IT'S DEFINED

# 陷阱 2: Mock 的属性链
# mock_repo.user.name 返回 Mock 对象,永远 truthy
# 解决: 明确设置 return_value
mock_repo.user.name = "Alice"  # 正确

# 陷阱 3: fixture 作用域误用
# 默认 scope="function",每个测试函数都重新创建
# 如果 fixture 创建成本高(如数据库连接),用 scope="session"
# 但要注意 session 级 fixture 的状态隔离问题

# 陷阱 4: 异步测试忘记 @pytest.mark.asyncio
# async def test_xxx():  # 如果没有 @pytest.mark.asyncio,pytest 不会 await 它
# 解决: pyproject.toml 中设置 asyncio_mode = "auto"

何时使用

  • pytest: 所有 Python 项目,不犹豫
  • pytest-asyncio: 涉及 asyncio 的项目
  • Hypothesis: 数据处理、算法、验证逻辑
  • pytest-cov: 持续监控覆盖率,CI 中设最低门槛

12.5 CI/CD

Java/Kotlin 对比

<!-- Maven: CI 中通常就 mvn verify -->
<!-- Jenkinsfile -->
<!--
pipeline {
    agent any
    tools {
        maven 'Maven 3.9'
        jdk 'JDK 21'
    }
    stages {
        stage('Build') { steps { sh 'mvn compile' } }
        stage('Test') { steps { sh 'mvn test' } }
        stage('Package') { steps { sh 'mvn package -DskipTests' } }
        stage('Deploy') { steps { sh 'mvn deploy' } }
    }
}
-->
// Gradle: CI 中通常 ./gradlew check
// GitHub Actions for Java
/*
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-java@v4
        with:
          java-version: '21'
          distribution: 'temurin'
      - run: ./gradlew check
*/

Python 实现

# .github/workflows/ci.yml — 完整的 Python CI 配置
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  # === 矩阵测试: 多 Python 版本 ===
  # 相当于 Maven 的 matrix/test-jdk-11, test-jdk-17, test-jdk-21
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10", "3.11", "3.12", "3.13"]

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v4
        with:
          version: "latest"

      - name: Set up Python ${{ matrix.python-version }}
        run: uv python install ${{ matrix.python-version }}

      - name: Install dependencies
        run: uv sync --all-extras

      - name: Lint (Ruff)
        run: uv run ruff check .

      - name: Format check (Ruff)
        run: uv run ruff format --check .

      - name: Type check (mypy)
        run: uv run mypy src/

      - name: Run tests
        run: uv run pytest --cov=my_package --cov-report=term-missing --cov-fail-under=80

      - name: Security audit
        run: uv run pip-audit

  # === 发布到 PyPI ===
  publish:
    needs: test
    if: github.ref == 'refs/heads/main' && startsWith(github.ref, 'refs/tags/')
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # Trusted Publishing(不需要 API token)

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v4

      - name: Build package
        run: uv build

      - name: Publish to PyPI
        run: uv publish
        # Trusted Publishing: PyPI 信任 GitHub Actions 的 OIDC token
        # 不需要在 GitHub Secrets 中存储 PyPI API token
# === Docker 多阶段构建 ===
# 相当于 Maven 的 Docker 构建: 多阶段编译 + 最小运行镜像

# 阶段 1: 安装依赖(利用 Docker 缓存)
FROM python:3.12-slim AS builder

WORKDIR /app

# 先复制依赖文件,利用缓存层
COPY pyproject.toml uv.lock ./

# 安装 uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# 安装依赖到虚拟环境
RUN uv sync --frozen --no-dev --no-install-project

# 复制源码并安装项目
COPY src/ ./src/
RUN uv sync --frozen --no-dev

# 阶段 2: 最小运行镜像
FROM python:3.12-slim AS runtime

WORKDIR /app

# 从 builder 复制虚拟环境
COPY --from=builder /app/.venv /app/.venv

# 复制源码
COPY --from=builder /app/src/ ./src/

# 非 root 用户运行(安全最佳实践)
RUN useradd --create-home appuser
USER appuser

ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1  # 日志实时输出,不缓冲

EXPOSE 8000

# 入口: 用 python -m 运行
CMD ["python", "-m", "my_package"]
# === 发布流程 ===

# 1. 版本管理: 用语义化版本
# 修改 pyproject.toml 中的 version
# 或用 uv version 命令:
# $ uv version patch   # 1.0.0 → 1.0.1
# $ uv version minor   # 1.0.0 → 1.1.0
# $ uv version major   # 1.0.0 → 2.0.0

# 2. 构建
# $ uv build
# 生成 dist/my_package-1.0.0.tar.gz (sdist)
# 和 dist/my_package-1.0.0-py3-none-any.whl (wheel)

# 3. 检查
# $ uv run twine check dist/*
# 验证包元数据是否正确

# 4. 发布到 TestPyPI(预发布验证)
# $ uv publish --index testpypi

# 5. 发布到 PyPI(正式发布)
# $ uv publish
# 或用 Trusted Publishing(推荐): CI 中自动发布,无需 token

# pyproject.toml 中配置发布信息:
"""
[project.urls]
Homepage = "https://github.com/example/my-package"
Documentation = "https://my-package.readthedocs.io"
Repository = "https://github.com/example/my-package"
Changelog = "https://github.com/example/my-package/releases"

[tool.uv]
dev-dependencies = [
    "twine>=5.0",
    "build>=1.0",
]
"""

核心差异

维度Java CI/CDPython CI/CD
构建工具mvn / gradlewuv / poetry
环境管理SDKMAN / JDK 安装uv python install
矩阵测试多 JDK 版本多 Python 版本
产物JAR / WAR / Dockerwheel / sdist / Docker
发布目标Maven Central / ArtifactoryPyPI / TestPyPI
认证方式GPG 签名 + settings.xmlTrusted Publishing / API token
Docker 镜像eclipse-temurin:21-jrepython:3.12-slim

常见陷阱

# 陷阱 1: Docker 中 pip 缓存失效
# 每次代码变更都重新安装所有依赖
# 解决: 先 COPY pyproject.toml,再 COPY src/
# 利用 Docker 层缓存,依赖不变时跳过安装

# 陷阱 2: CI 中 Python 版本不一致
# 本地用 3.12,CI 用 3.10 → 类型注解语法可能不兼容
# 解决: pyproject.toml 中 requires-python = ">=3.10"
# CI 中矩阵测试所有支持的版本

# 陷阱 3: 忘记 PYTHONUNBUFFERED
# Docker 中 Python 默认缓冲 stdout
# 日志不会实时输出到 docker logs
# 解决: ENV PYTHONUNBUFFERED=1

# 陷阱 4: 发布时忘记 --frozen
# 不加 --frozen,uv sync 可能更新锁文件
# CI 中应该用 --frozen 确保严格按锁文件安装

何时使用

  • GitHub Actions: 最简单的 CI 方案,Python 生态首选
  • 矩阵测试: 支持多版本时必须使用
  • 多阶段 Docker: 生产部署必须使用
  • Trusted Publishing: 发布到 PyPI 的推荐方式(无需管理 token)

12.6 安全实践

Java/Kotlin 对比

<!-- Maven: OWASP Dependency-Check 插件 -->
<plugin>
    <groupId>org.owasp</groupId>
    <artifactId>dependency-check-maven</artifactId>
    <version>9.2.0</version>
    <executions>
        <execution>
            <goals><goal>check</goal></goals>
        </execution>
    </executions>
</plugin>

<!-- Java: 密钥管理通常用 Vault 或 KMS -->
<!-- 环境变量: System.getenv("DB_PASSWORD") -->
// Kotlin: 输入验证
data class UserRequest(
    @field:NotBlank val name: String,
    @field:Email val email: String,
    @field:Min(0) val age: Int,
)

// Bean Validation (JSR 380)
fun createUser(@Valid request: UserRequest): User { ... }

Python 实现

# === 依赖安全扫描 ===
# 相当于 OWASP Dependency-Check

# 方案 1: pip-audit(推荐,Python 官方支持)
# $ pip-audit --desc
# 检查已知漏洞(基于 PyPI Advisory Database + OSV)
# 输出:
# Name         Version ID             Summary
# ----         ------- --             -------
# requests     2.25.0  PYSEC-2023-XX  Unintended leak of Proxy-Authorization header

# 方案 2: safety
# $ safety check --full-report
# 基于 pyup.io 的漏洞数据库

# CI 中集成:
# $ uv run pip-audit --strict  # 发现漏洞时退出码非零

# pyproject.toml:
"""
[tool.uv]
dev-dependencies = [
    "pip-audit>=2.7",
]
"""
# === Secrets 管理 ===
# 原则: 永远不要把密钥硬编码在代码中

# 方式 1: 环境变量(最基础)
import os

DATABASE_URL = os.environ["DATABASE_URL"]  # 缺少时抛 KeyError
API_KEY = os.environ.get("API_KEY", "default")  # 可选的,有默认值

# 方式 2: python-dotenv(开发环境)
# .env 文件(不提交到 git!)
"""
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
API_KEY=dev-key-12345
DEBUG=true
"""

# .gitignore 中添加:
# .env
# .env.local
# .env.*.local

from dotenv import load_dotenv

load_dotenv()  # 从 .env 加载到 os.environ
DATABASE_URL = os.environ["DATABASE_URL"]

# 方式 3: Pydantic Settings(推荐,类型安全 + 验证)
from pydantic_settings import BaseSettings, SettingsConfigDict


class AppConfig(BaseSettings):
    """类型安全的配置管理,相当于 Spring Boot 的 @ConfigurationProperties"""

    database_url: str
    api_key: str
    debug: bool = False
    max_connections: int = 10
    allowed_origins: list[str] = ["*"]

    model_config = SettingsConfigDict(
        env_file=".env",       # 从 .env 加载
        env_file_encoding="utf-8",
        case_sensitive=False,  # 环境变量不区分大小写
    )


# 使用
config = AppConfig()  # 自动从环境变量 + .env 加载并验证
print(config.database_url)  # 类型: str
print(config.max_connections)  # 类型: int,自动转换

# 缺少必要配置时:
# config = AppConfig()  # → ValidationError: database_url field required

# 前缀支持(微服务常见)
class DatabaseConfig(BaseSettings):
    host: str = "localhost"
    port: int = 5432
    name: str

    model_config = SettingsConfigDict(env_prefix="DB_")
    # 环境变量: DB_HOST, DB_PORT, DB_NAME
# === 输入验证: Pydantic ===
# 相当于 Bean Validation (JSR 380)

from pydantic import BaseModel, EmailStr, Field, field_validator
from datetime import date


class UserCreate(BaseModel):
    """请求体验证模型"""

    name: str = Field(min_length=1, max_length=100)
    email: EmailStr  # 自动验证邮箱格式
    age: int = Field(ge=0, le=150)  # 0 <= age <= 150
    birth_date: date | None = None
    password: str = Field(min_length=8, pattern=r"[A-Za-z0-9!@#$%^&*]{8,}")

    @field_validator("name")
    @classmethod
    def name_must_not_contain_special_chars(cls, v: str) -> str:
        if any(c in v for c in "!@#$%^&*"):
            raise ValueError("name must not contain special characters")
        return v.strip()


# 使用
try:
    user = UserCreate(
        name="Alice",
        email="alice@example.com",
        age=30,
        password="secure123!",
    )
except Exception as e:
    print(e)
    # ValidationError: 4 validation errors
    # name:   field required
    # email:  value is not a valid email address
    # age:    ensure this value is greater than or equal to 0
    # password: string does not match regex

# Pydantic 在 FastAPI 中自动集成:
# from fastapi import FastAPI
# app = FastAPI()
# @app.post("/users")
# async def create_user(user: UserCreate):  # 自动验证请求体
#     return user.model_dump()
# === 避免 eval/exec/pickle 反序列化攻击 ===

# 1. eval: 永远不要用 eval 处理用户输入
import ast

# 危险!用户输入会被当作 Python 代码执行
# user_input = "__import__('os').system('rm -rf /')"
# result = eval(user_input)  # 灾难!

# 安全替代: ast.literal_eval(只解析字面量)
user_input = "[1, 2, 3, 'hello']"
result = ast.literal_eval(user_input)  # 安全: 返回 [1, 2, 3, 'hello']
# 只支持: 字符串、字节、数字、元组、列表、字典、集合、布尔、None

# 2. exec: 同样危险,永远不要执行用户输入
# exec(user_input)  # 灾难!

# 3. pickle: 反序列化可以执行任意代码
import pickle
import json

# 危险!pickle 文件可以包含恶意代码
# with open("user_data.pkl", "rb") as f:
#     data = pickle.load(f)  # 可能执行任意代码

# 安全替代: JSON(只序列化数据,不序列化代码)
data = {"name": "Alice", "age": 30}
serialized = json.dumps(data)  # 安全
deserialized = json.loads(serialized)  # 安全

# 如果必须用 pickle,只加载自己生成的文件
# 并且在沙箱环境中运行
# === 安全配置清单 ===

# pyproject.toml:
"""
[tool.ruff.lint]
select = [
    # ... 其他规则 ...
    "S",    # flake8-bandit: 安全规则
]
ignore = [
    "S101",   # 允许 assert(测试中需要)
    "S105",   # 允许硬编码密码(开发环境,CI 中应启用)
]

[tool.bandit]
# Bandit: 专门的安全扫描工具
# $ bandit -r src/
skips = ["B101", "B601"]
"""

# .pre-commit-config.yaml 中添加安全检查:
"""
repos:
  - repo: https://github.com/PyCQA/bandit
    rev: 1.7.8
    hooks:
      - id: bandit
        args: ["-c", "pyproject.toml"]
        additional_dependencies: ["bandit[toml]"]
"""
bandit: Python 安全漏洞扫描

vs Java: OWASP Dependency-Check / SpotBugs Security

# 安装
pip install bandit

# 扫描整个项目
bandit -r src/

# 扫描并输出 JSON(CI/CD 集成)
bandit -r src/ -f json -o bandit-report.json

# 只检查高危问题
bandit -r src/ -ll

常见检测项:

检测项风险等级说明
B102: exec 使用代码注入风险
B106: 硬编码密码凭证泄露
B301: pickle 使用反序列化攻击
B608: 硬编码 SQLSQL 注入风险
B324: hashlib 弱哈希MD5/SHA1 不安全
# bandit 会标记这些问题:
password = "hardcoded_password"  # B106
exec(user_input)                  # B102
import pickle; pickle.loads(data)  # B301

核心差异

维度Java/KotlinPython
依赖扫描OWASP Dependency-Checkpip-audit / safety
密钥管理Vault / KMS / env环境变量 / pydantic-settings
输入验证Bean ValidationPydantic
反序列化安全默认安全(Java 序列化也有风险)pickle 不安全,用 JSON
代码安全扫描SpotBugs / FindSecBugsBandit
密钥轮换Spring Cloud Config外部配置服务 + 重载

常见陷阱

# 陷阱 1: .env 文件提交到 git
# 解决: .gitignore 中添加 .env
# 或用 pre-commit 检查:
# detect-secrets 工具可以扫描代码中的密钥

# 陷阱 2: 日志中打印敏感信息
import structlog
logger = structlog.get_logger()

# 危险
# logger.info("user login", password=user.password)

# 安全: 只记录必要信息
logger.info("user login", user_id=user.id)

# 陷阱 3: SQL 注入(ORM 不一定安全)
# 危险: 原始 SQL 拼接
# cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")

# 安全: 参数化查询
# cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))

# 陷阱 4: YAML 反序列化
# yaml.load() 可以执行任意代码(类似 pickle)
# 解决: yaml.safe_load()(只解析基本类型)

何时使用

  • pip-audit: CI 中每次构建都运行
  • pydantic-settings: 所有需要配置管理的项目
  • Pydantic 验证: API 入口、配置解析、数据处理
  • Bandit: CI 中定期扫描
  • JSON 替代 pickle: 所有序列化场景

12.7 可观测性

Java/Kotlin 对比

<!-- Logback JSON 配置 -->
<appender name="json" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
        <includeMdc>true</includeMdc>
    </encoder>
</appender>

<!-- Micrometer: 指标收集 -->
<!-- 暴露 /actuator/prometheus 端点 -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- Spring Boot Actuator: 健康检查 -->
<!-- /actuator/health 端点自动提供 -->
// Kotlin logging
import io.github.oshai.kotlinlogging.KotlinLogging
private val logger = KotlinLogging.logger {}

class UserService {
    fun createUser(name: String) {
        logger.info { "Creating user: $name" }
    }
}

Python 实现

# === 结构化日志: structlog ===
# 相当于 Logback JSON + Kotlin Logging

import structlog
import logging
import json


# 配置 structlog(通常在应用启动时配置一次)
structlog.configure(
    processors=[
        # 添加日志级别、时间戳等标准字段
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        # 异常时添加 traceback
        structlog.processors.format_exc_info,
        # 输出为 JSON(生产环境)
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()


# 使用: 结构化日志,每个字段都是独立 key-value
logger.info("user_created", user_id=42, name="Alice", email="alice@example.com")
# 输出 JSON:
# {"event":"user_created","user_id":42,"name":"Alice","email":"alice@example.com",
#  "level":"info","timestamp":"2024-01-15T10:30:00Z"}

logger.error(
    "database_connection_failed",
    host="db.example.com",
    port=5432,
    error="connection refused",
)
# 输出 JSON:
# {"event":"database_connection_failed","host":"db.example.com","port":5432,
#  "error":"connection refused","level":"error","timestamp":"..."}

# 上下文绑定: 自动附加到所有日志
structlog.contextvars.bind_contextvars(
    request_id="req-123",
    service="user-api",
    version="1.0.0",
)

logger.info("processing_request")  # 自动包含 request_id, service, version
# === python-json-logger: 更轻量的替代方案 ===
# 如果你不想引入 structlog,用标准 logging + JSON formatter

import logging
import json
from pythonjsonlogger import jsonlogger


# 自定义 JSON formatter
class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super().add_fields(log_record, record, message_dict)
        log_record["level"] = record.levelname
        log_record["logger"] = record.name
        if hasattr(record, "request_id"):
            log_record["request_id"] = record.request_id


# 配置标准 logging
handler = logging.StreamHandler()
formatter = CustomJsonFormatter(
    "%(asctime)s %(levelname)s %(message)s",
    rename_fields={"asctime": "timestamp", "levelname": "level"},
)
handler.setFormatter(formatter)

root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)

# 使用
logger = logging.getLogger("my_package")
logger.info("user_created", extra={"user_id": 42, "name": "Alice"})
# 输出: {"timestamp":"...","level":"INFO","user_id":42,"name":"Alice","message":"user_created"}
# === 指标: Prometheus client ===
# 相当于 Micrometer

from prometheus_client import Counter, Histogram, Gauge, start_http_server, Summary
import random
import time


# 定义指标(通常在模块级别定义)
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"],
)

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    ["method", "endpoint"],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
)

ACTIVE_CONNECTIONS = Gauge(
    "db_active_connections",
    "Active database connections",
)

TASK_DURATION = Summary(
    "task_duration_seconds",
    "Task duration",
    ["task_name"],
)


# 在业务代码中使用
def handle_request(method: str, endpoint: str):
    REQUEST_COUNT.labels(method=method, endpoint=endpoint, status="200").inc()

    with REQUEST_LATENCY.labels(method=method, endpoint=endpoint).time():
        # 业务逻辑
        time.sleep(random.uniform(0.01, 0.1))


def process_task(task_name: str):
    with TASK_DURATION.labels(task_name=task_name).time():
        # 任务处理
        time.sleep(random.uniform(0.1, 0.5))


# 暴露 /metrics 端点
# start_http_server(8000)  # 在独立线程中启动 HTTP server
# 访问 http://localhost:8000/metrics 获取 Prometheus 格式的指标

# 自定义指标收集
class BusinessMetrics:
    """业务指标封装"""

    def __init__(self):
        self.orders_created = Counter(
            "orders_created_total",
            "Total orders created",
            ["region", "category"],
        )
        self.order_value = Histogram(
            "order_value_dollars",
            "Order value distribution",
            ["region"],
            buckets=[10, 50, 100, 500, 1000, 5000],
        )
        self.inventory_level = Gauge(
            "inventory_items_remaining",
            "Current inventory level",
            ["product_id"],
        )

    def record_order(self, region: str, category: str, value: float):
        self.orders_created.labels(region=region, category=category).inc()
        self.order_value.labels(region=region).observe(value)

    def update_inventory(self, product_id: str, count: int):
        self.inventory_level.labels(product_id=product_id).set(count)


metrics = BusinessMetrics()
metrics.record_order("us", "electronics", 299.99)
metrics.update_inventory("prod-123", 42)
# === 分布式追踪: OpenTelemetry ===
# 相当于 Micrometer Tracing / Spring Cloud Sleuth

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME


# 初始化(应用启动时执行一次)
def setup_tracing(service_name: str = "my-service"):
    resource = Resource.create({SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)

    # 导出到 OTLP collector(Jaeger/Zipkin/Tempo)
    otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

    trace.set_tracer_provider(provider)
    return trace.get_tracer(__name__)


tracer = setup_tracing("user-api")


# 在业务代码中使用
def create_user(name: str, email: str) -> dict:
    with tracer.start_as_current_span("create_user") as span:
        span.set_attribute("user.name", name)
        span.set_attribute("user.email", email)

        # 模拟数据库操作
        with tracer.start_as_current_span("db.insert_user"):
            user_id = save_to_database(name, email)
            span.set_attribute("db.user_id", user_id)

        # 模拟发送通知
        with tracer.start_as_current_span("notification.send_welcome"):
            send_welcome_email(email)

        return {"id": user_id, "name": name}


# 自动 instrument: 自动为 HTTP/数据库/Redis 等添加 span
# pip install opentelemetry-instrumentation-httpx
# opentelemetry-instrument python -m my_package
# === 健康检查端点 ===
# 相当于 Spring Boot Actuator /health

from dataclasses import dataclass
from enum import Enum


class HealthStatus(str, Enum):
    UP = "UP"
    DOWN = "DOWN"
    DEGRADED = "DEGRADED"


@dataclass
class HealthCheck:
    status: HealthStatus
    checks: dict[str, dict]

    def to_dict(self) -> dict:
        return {
            "status": self.status.value,
            "checks": self.checks,
        }


async def check_database() -> dict:
    """检查数据库连接"""
    try:
        # 实际项目中执行 SELECT 1
        return {"status": "UP", "response_time_ms": 5}
    except Exception as e:
        return {"status": "DOWN", "error": str(e)}


async def check_redis() -> dict:
    """检查 Redis 连接"""
    try:
        # 实际项目中执行 PING
        return {"status": "UP", "used_memory_mb": 128}
    except Exception as e:
        return {"status": "DOWN", "error": str(e)}


async def check_disk_space() -> dict:
    """检查磁盘空间"""
    import shutil
    usage = shutil.disk_usage("/")
    percent = usage.used / usage.total * 100
    status = "UP" if percent < 90 else "DEGRADED" if percent < 95 else "DOWN"
    return {"status": status, "usage_percent": round(percent, 1)}


async def health_check() -> HealthCheck:
    """聚合健康检查"""
    import asyncio

    checks = {}
    overall_status = HealthStatus.UP

    # 并行执行所有检查
    results = await asyncio.gather(
        check_database(),
        check_redis(),
        check_disk_space(),
        return_exceptions=True,
    )

    check_names = ["database", "redis", "disk_space"]
    for name, result in zip(check_names, results):
        if isinstance(result, Exception):
            checks[name] = {"status": "DOWN", "error": str(result)}
            overall_status = HealthStatus.DOWN
        else:
            checks[name] = result
            if result["status"] == "DOWN":
                overall_status = HealthStatus.DOWN
            elif result["status"] == "DEGRADED" and overall_status == HealthStatus.UP:
                overall_status = HealthStatus.DEGRADED

    return HealthCheck(status=overall_status, checks=checks)


# FastAPI 集成示例:
# from fastapi import FastAPI
# app = FastAPI()
#
# @app.get("/health")
# async def health():
#     return await health_check()
# # 输出:
# # {
# #   "status": "UP",
# #   "checks": {
# #     "database": {"status": "UP", "response_time_ms": 5},
# #     "redis": {"status": "UP", "used_memory_mb": 128},
# #     "disk_space": {"status": "UP", "usage_percent": 45.2}
# #   }
# # }
# === 完整可观测性配置示例 ===

# pyproject.toml:
"""
[project]
dependencies = [
    "structlog>=23.0",
    "prometheus-client>=0.20",
    "opentelemetry-api>=1.24",
    "opentelemetry-sdk>=1.24",
    "opentelemetry-exporter-otlp>=1.24",
    "pydantic-settings>=2.0",
]

[project.optional-dependencies]
observability = [
    "opentelemetry-instrumentation-httpx>=0.45b",
    "opentelemetry-instrumentation-redis>=0.45b",
    "opentelemetry-instrumentation-sqlalchemy>=0.45b",
]
"""

# app/observability.py — 统一初始化
"""
import structlog
import logging
from prometheus_client import start_http_server
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME


def setup_observability(
    service_name: str,
    log_level: str = "INFO",
    otlp_endpoint: str = "http://localhost:4317",
    metrics_port: int = 8000,
) -> None:
    # 1. 结构化日志
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.make_filtering_bound_logger(
            getattr(logging, log_level)
        ),
        context_class=dict,
        logger_factory=structlog.PrintLoggerFactory(),
    )

    # 2. 分布式追踪
    resource = Resource.create({SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
    trace.set_tracer_provider(provider)

    # 3. Prometheus 指标端点
    start_http_server(metrics_port)
"""

核心差异

维度Java/KotlinPython
结构化日志Logback JSON + Kotlin Loggingstructlog / python-json-logger
指标收集Micrometer + Prometheusprometheus-client
分布式追踪Micrometer Tracing / SleuthOpenTelemetry
健康检查Spring Boot Actuator自定义端点
自动 instrumentSpring 自动装配opentelemetry-instrumentation-*
日志上下文MDC / ThreadLocalstructlog contextvars

常见陷阱

# 陷阱 1: 用 print 而不是 logger
# print 输出到 stdout,没有级别、没有结构、无法被日志系统收集
# 解决: 全局搜索 print(),替换为 logger

# 陷阱 2: 日志中包含敏感信息
# structlog 默认会序列化所有传入的字段
# 如果不小心传入 password 字段,会被记录到日志中
# 解决: structlog 有 processor 可以过滤敏感字段

# 陷阱 3: Prometheus 指标标签基数爆炸
# 用 user_id 作为标签 → 每个用户一个时间序列 → 内存爆炸
# 解决: 只用低基数标签(method, endpoint, status)
# 高基数数据放在 span attributes 中

# 陷阱 4: OpenTelemetry 上下文传播
# Python 的 asyncio 需要正确传播上下文
# 用 contextvars 而不是 thread-local
# structlog.contextvars 已经基于 contextvars

何时使用

  • structlog: 所有微服务、API 项目
  • prometheus-client: 需要指标监控的服务
  • OpenTelemetry: 分布式系统、微服务架构
  • 健康检查: 所有生产部署的服务
  • python-json-logger: 简单场景、已有 logging 代码的项目

本章小结

主题Java/Kotlin 工具Python 推荐工具
项目结构Maven/Gradle 标准目录src layout + pyproject.toml
依赖管理Maven/Gradleuv(首选)/ Poetry
代码质量Checkstyle + SpotBugs + ktlintRuff + mypy
测试JUnit 5 + Mockito + JaCoCopytest + Hypothesis + pytest-cov
CI/CDJenkins/GitHub ActionsGitHub Actions + uv
安全OWASP + Vaultpip-audit + pydantic-settings
可观测性Logback + Micrometer + Sleuthstructlog + prometheus-client + OTel

核心原则: Python 的工程化工具链正在快速收敛到 pyproject.toml 作为唯一配置入口,uv 作为统一包管理器。这个趋势和 Java 生态从 Ant → Maven → Gradle 的收敛过程类似。尽早采用现代工具链,避免在过时的工具上浪费时间。