YASA上手指南本文详细介绍了YASA引擎的部署与开发流程，包括环境搭建指南、开发调试方法等，为开发者快速上手YASA引

本文作者：jiachunpeng

YASA 社区核心贡献者，专注于程序分析技术

GitHub主页：github.com/jiachunpeng

0 从零开始部署 YASA 开发环境

0.0 环境要求

操作系统：本教程主要适用于 Linux（推荐 Ubuntu 20.04+），macOS 用户可参考使用。

Windows 用户建议使用 WSL2（Windows Subsystem for Linux 2）运行。

Node.js：版本 18 或更高（推荐使用 LTS 版本）
包管理器：npm（随 Node.js 一起安装）

0.1 准备 Node.js 环境

如果系统已安装 Node.js 18+，可跳过此步骤。

使用 nvm 安装（推荐）：

# 安装 nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc

# 安装并使用 LTS 版本
nvm install --lts
nvm use --lts
node -v  # 验证版本

使用系统包管理器（Ubuntu/Debian）：

sudo apt update
sudo apt install nodejs npm
node -v  # 验证版本，确保 >= 18

如果包管理器版本过低，可以直接使用 nvm 方式，不需要卸载默认安装的 node。

0.2 获取源代码

git clone https://github.com/antgroup/YASA-Engine.git
cd YASA-Engine

0.3 构建项目

YASA 引擎使用 TypeScript 编写，需要先编译才能构建。项目提供了 build.sh 脚本，一键完成依赖安装、类型检查、测试和编译：

# 方式一：使用 npm 脚本（推荐）
npm run build

# 方式二：直接执行构建脚本
bash build.sh

构建流程说明：

构建脚本会自动执行以下步骤：

清理历史构建结果
安装依赖（npm install）
TypeScript 类型检查
运行测试套件
编译 TypeScript 到 JavaScript（输出到 dist/ 目录）
打包二进制文件（生成 yasa-engine-* 可执行文件）

注意：首次构建可能需要较长时间，请耐心等待。

0.4 验证安装

构建完成后，可以通过以下方式验证：

# 方式一：使用编译后的 JavaScript
node dist/main.js --help

# 方式二：使用打包后的二进制文件（如果已生成）
./yasa-engine-linux-x64 --help

如果看到帮助信息输出，说明安装成功。

0.5 开发调试配置

方式一：直接运行 TypeScript（推荐用于开发）

项目已包含 tsx 依赖，可以直接运行 TypeScript 文件，无需每次编译：

用这种方式，每次修改代码之后，可以直接调试，立刻生效，但是会有一定性能损失，同时使用 0x 等工具无法获得准确的性能追踪数据

注意！使用这种方式如果运行失败，尝试删除 dist 文件夹后再进行调试，当前 YASA 代码可能存在问题。

npx tsx src/main.ts --help

VSCode 调试配置（TypeScript）：

在 .vscode/launch.json 中添加：

{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "调试 TypeScript",
            "runtimeExecutable": "npx",
            "runtimeArgs": ["tsx"],
            "skipFiles": ["<node_internals>/**"],
            "program": "${workspaceFolder}/src/main.ts",
            "args": ["--help"],
            "console": "integratedTerminal"
        }
    ]
}

方式二：调试编译后的 JavaScript

如果需要调试编译后的代码，在 .vscode/launch.json 中添加：

{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "运行 YASA 引擎",
            "skipFiles": ["<node_internals>/**"],
            "program": "${workspaceFolder}/dist/main.js",
            "args": ["--help"],
            "console": "integratedTerminal",
            "preLaunchTask": "npm: build"
        }
    ]
}

0.6 运行测试用例

获取测试基准

# 在 YASA-Engine 目录外克隆测试基准
cd ..
git clone https://github.com/alipay/ant-application-security-testing-benchmark.git
cd YASA-Engine

运行完整测试

# 分析整个测试基准
node dist/main.js \
  --sourcePath ../ant-application-security-testing-benchmark/ \
  --checkerPackIds taint-flow-javascript-default \
  --language javascript \
  --report ./report/js \
  --ruleConfigFile ./resource/example-rule-config/rule_config_js.json

命令参数说明：

参数	说明
`--sourcePath`	要分析的源代码路径（文件或目录）
`--checkerPackIds`	分析器包 ID，如 `taint-flow-javascript-default`
`--checkerIds`	单个分析器 ID（与 `--checkerPackIds` 互斥），如 `taint_flow_js_input`
`--language`	分析语言：`javascript`、`java`、`go`、`python` 等
`--report`	报告输出目录
`--ruleConfigFile`	规则配置文件路径

运行简单示例

以下是一个简单的测试用例，适合调试和了解引擎工作原理：

node dist/main.js \
  --sourcePath ../ant-application-security-testing-benchmark/sast-js/case/completeness/single_app_tracing/function_call/library_function/json_002_F.js \
  --checkerIds taint_flow_js_input \
  --language javascript \
  --report ./report/js \
  --ruleConfigFile ./resource/example-rule-config/rule_config_js.json

VSCode 调试配置示例：

{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "调试简单例子",
            "runtimeExecutable": "npx",
            "runtimeArgs": ["tsx"],
            "skipFiles": ["<node_internals>/**"],
            "program": "${workspaceFolder}/src/main.ts",
            "args": [
                "--sourcePath",
                "../ant-application-security-testing-benchmark/sast-js/case/completeness/single_app_tracing/function_call/library_function/json_002_F.js",
                "--checkerIds",
                "taint_flow_js_input",
                "--language",
                "javascript",
                "--report",
                "./report/js",
                "--ruleConfigFile",
                "./resource/example-rule-config/rule_config_js.json"
            ],
            "console": "integratedTerminal"
        }
    ]
}

提示：请根据实际路径调整 --sourcePath 参数。

1 通过简单例子了解 YASA 工作流程

1.1 测试用例

以下是一个简单的测试用例，用于理解 YASA 引擎的工作流程。该用例包含一个污点数据流：从源（taint_src）经过函数调用和 JSON 处理，最终到达汇点（execSync）。

运行命令：

node dist/main.js \
  --sourcePath ../ant-application-security-testing-benchmark/sast-js/case/completeness/single_app_tracing/function_call/library_function/json_002_F.js \
  --checkerIds taint_flow_js_input \
  --language javascript \
  --report ./report/js \
  --ruleConfigFile ./resource/example-rule-config/rule_config_js.json

被测代码：

const { execSync } = require('child_process');

function json_002_F(__taint_src) {
  process(JSON.stringify("aa"));

  function process(arg) {
    let obj = JSON.parse(arg);
    __taint_sink(obj);
  }
}

function __taint_sink(o) {
  execSync(o);  // 汇点：命令执行
}

const taint_src = "taint_src_value";  // 源：污点数据

json_002_F(taint_src);

代码说明：

taint_src 是污点源（Source）
json_002_F 函数接收污点数据，经过 JSON.stringify 和 JSON.parse 处理
__taint_sink 函数中的 execSync 是污点汇点（Sink）
引擎应该检测到从 taint_src 到 execSync 的污点数据流，并生成一条报警记录

1.2 引擎工作流程

YASA 引擎的分析流程主要分为三个阶段。下面结合真实代码片段说明每个阶段的工作：

标注说明：

[引擎]：引擎核心功能，checker 开发者通常不需要修改
[Checker]：checker 开发者需要实现和关注的部分
[工具]：checker-manager 等工具层，checker 开发者需要了解但通常不需要修改

建议在下列关键函数（analyzeSingleFile, executeAnalysisPipeline 等）处添加断点，结合调试信息梳理流程，调试方法可参考第 0.6 节。

1.2.1 预处理阶段

入口函数： [引擎] analyzer.ts 中的 analyzeSingleFile 调用 executeAnalysisPipeline

// src/engine/analyzer/common/analyzer.ts
async analyzeSingleFile(source: any, fileName: any) {
  try {
    if (typeof this.preProcess4SingleFile === 'function' && typeof this.symbolInterpret === 'function') {
      return await this.executeAnalysisPipeline(() => this.preProcess4SingleFile(source, fileName))
    }
    // ...
  }
}

执行流程：

[引擎] 设置预处理标记 Rules.setPreprocessReady(false) 标记进入预处理阶段

// src/engine/analyzer/common/analyzer.ts
private async executeAnalysisPipeline(preProcessFn: () => void | Promise<void>): Promise<any> {
  this.performanceTracker.start('preProcess')
  
  Rules.setPreprocessReady(false)  // 标记预处理阶段
  
  const result = preProcessFn()
  if (result instanceof Promise) {
    await result
  }
  this.performanceTracker.end('preProcess')
  // ...
}

[引擎] 执行 preProcess4SingleFile() 解析源代码为 AST，并遍历收集信息

// src/engine/analyzer/javascript/common/js-analyzer.ts
preProcess4SingleFile(source: any, fileName: any) {
  this.initTopScope()
  this.state = this.initState()
  
  // 解析源代码为 AST
  this.uast = Parser.parseSingleFile(fileName, options, this.sourceCodeCache)
  
  if (this.uast) {
    this.initModuleScope(this.uast, fileName)
    // 遍历 AST，处理模块
    this.processModule(this.uast, fileName)
  }
}

[工具] 引擎在遍历 AST 时，通过 checker-manager 调用 checker 注册的回调函数

// src/engine/checker-manager/checker-manager.ts (示意)
// 引擎在遍历 AST 遇到函数定义时，会调用：
checkerManager.checkAtFunctionDefinition(analyzer, scope, node, state, info)
// 这会触发所有注册了 triggerAtFunctionDefinition 的 checker

[Checker] checker 在 triggerAtFunctionDefinition 回调中将函数加入 sourceScope

// src/checker/taint/js/js-taint-checker.ts
// Checker 开发者需要实现此方法
triggerAtFunctionDefinition(analyzer: any, scope: any, node: any, state: any, info: any) {
  if (config.analyzer !== 'JavaScriptAnalyzer') {
    return
  }
  // 将函数定义加入 sourceScope（用于后续标记传播）
  commonUtil.fillSourceScope(info.fclos, this.sourceScope)
}

1.2.2 分析初始化阶段

[引擎] 执行 startAnalyze() 标记预处理完成，触发 checker 的初始化回调

// src/engine/analyzer/common/analyzer.ts
this.performanceTracker.start('startAnalyze')
this.startAnalyze()  // 内部会调用 checkerManager.checkAtStartOfAnalyze
this.performanceTracker.end('startAnalyze')

Rules.setPreprocessReady(true)  // 标记预处理完成

[工具] checker-manager 调用所有 checker 的 triggerAtStartOfAnalyze 方法

// src/engine/checker-manager/checker-manager.ts (示意)
// 引擎调用 startAnalyze() 时，会触发：
checkerManager.checkAtStartOfAnalyze(analyzer, scope, node, state, info)
// 这会调用所有注册的 checker 的 triggerAtStartOfAnalyze 方法

[Checker] checker 在 triggerAtStartOfAnalyze 中确定分析入口（EntryPoints）

// src/checker/taint/js/js-taint-checker.ts
// Checker 开发者需要实现此方法
triggerAtStartOfAnalyze(analyzer: any, scope: any, node: any, state: any, info: any) {
  if (config.analyzer !== 'JavaScriptAnalyzer') {
    return
  }
  const { topScope, fileManager } = analyzer
  // 准备入口点：从配置文件、调用图、文件入口等方式获取
  this.prepareEntryPoints(analyzer, topScope, fileManager)
  analyzer.entryPoints.push(...this.entryPoints)
  // ...
}

[Checker] 为 sourceScope 中的函数添加污点标记

// src/checker/taint/js/js-taint-checker.ts
// Checker 开发者需要实现此逻辑
triggerAtStartOfAnalyze(analyzer: any, scope: any, node: any, state: any, info: any) {
  // ...
  // 为 sourceScope 中的函数添加 JS_INPUT 标记（用于后续污点传播）
  this.addSourceTagForSourceScope(TAINT_TAG_NAME_JS_TAINT, this.sourceScope.value)
  // 为规则配置中的内容添加标记
  this.addSourceTagForcheckerRuleConfigContent(TAINT_TAG_NAME_JS_TAINT, this.checkerRuleConfigContent)
}

1.2.3 符号执行阶段

[引擎] 遍历入口点，对每个 EntryPoint 进行符号执行

// src/engine/analyzer/javascript/common/js-analyzer.ts
symbolInterpret() {
  const { entryPoints } = this
  const state = this.initState(this.topScope)
  
  if (_.isEmpty(entryPoints)) {
    logger.info('[symbolInterpret]：EntryPoints are not found')
    return true
  }
  
  // 遍历每个入口点
  for (const entryPoint of entryPoints) {
    if (entryPoint.type === constValue.ENGIN_START_FUNCALL) {
      // 执行入口点函数（会触发各种 checker 回调）
      this.executeCall(entryPoint.entryPointSymVal, argValues, state, entryPoint.entryPointSymVal.ast, scope)
    }
  }
}

[工具] 引擎在符号执行过程中，通过 checker-manager 在关键节点触发 checker 回调

// src/engine/checker-manager/checker-manager.ts (示意)
// 引擎在遇到标识符时，会调用：
checkerManager.checkAtIdentifier(analyzer, scope, node, state, info)
// 引擎在函数调用前，会调用：
checkerManager.checkAtFunctionCallBefore(analyzer, scope, node, state, info)

[Checker] 在 triggerAtIdentifier 回调中识别 Source 并标记污点

// src/checker/taint/js/js-taint-checker.ts
// Checker 开发者需要实现此方法
triggerAtIdentifier(analyzer: any, scope: any, node: any, state: any, info: any) {
  if (config.analyzer !== 'JavaScriptAnalyzer') {
    return
  }
  // 检查是否是污点源，如果是则标记（使用工具类 IntroduceTaint）
  IntroduceTaint.introduceTaintAtIdentifier(node, info.res, this.sourceScope.value)
}

[引擎] 引擎在符号执行过程中自动跟踪污点数据在程序中的传播路径（通过符号值系统）
[Checker] 在 triggerAtFunctionCallBefore 中检测 Sink

// src/checker/taint/js/js-taint-checker.ts
// Checker 开发者需要实现此方法
triggerAtFunctionCallBefore(analyzer: any, scope: any, node: any, state: any, info: any) {
  if (config.analyzer !== 'JavaScriptAnalyzer') {
    return
  }
  const { fclos, argvalues } = info
  // 检查函数调用参数中的污点源
  IntroduceTaint.introduceFuncArgTaintByRuleConfig(fclos?.object, node, argvalues, funcCallArgTaintSource)
  // 检查是否是 Sink
  this.checkSinkAtFunctionCall(node, fclos, argvalues)
  // ...
}

// Checker 开发者需要实现此方法
checkSinkAtFunctionCall(node: any, fclos: any, argvalues: any) {
  const rules = this.checkerRuleConfigContent.sinks?.FuncCallTaintSink
  let rule = matchSinkAtFuncCall(node, fclos, rules)
  
  if (rule) {
    const args = Rules.prepareArgs(argvalues, fclos, rule)
    // 使用 SanitizerChecker 工具检查是否经过 Sanitizer 处理
    const ndResultWithMatchedSanitizerTagsArray = SanitizerChecker.findTagAndMatchedSanitizer(
      node, fclos, args, null, TAINT_TAG_NAME_JS_TAINT, true, sanitizers
    )
    // 如果污点数据到达 Sink 且未经过 Sanitizer 处理，则产生报警
    if (ndResultWithMatchedSanitizerTagsArray) {
      const taintFlowFinding = this.buildTaintFinding(/* ... */)
      if (!TaintOutputStrategy.isNewFinding(this.resultManager, taintFlowFinding)) return
      this.resultManager.newFinding(taintFlowFinding, TaintOutputStrategy.outputStrategyId)
    }
  }
}

[引擎] 完成分析后输出检测报告

// src/engine/analyzer/common/analyzer.ts
this.endAnalyze()
this.performanceTracker.logPerformance(this)
return this.recordCheckerFindings()  // 记录并返回检测结果

1.2.4 总结：Checker 开发者需要关注的部分

对于 只需要开发 checker 而不需要修改引擎 的开发者，你的主要工作集中在规则逻辑实现上，引擎会自动处理底层分析。
以下是需要你重点关注的部分：

Checker 开发者需要实现的方法：

triggerAtFunctionDefinition：在预处理阶段收集函数定义，加入 sourceScope
triggerAtStartOfAnalyze：
- 准备分析入口点（EntryPoints）
- 为相关函数添加污点标记
triggerAtIdentifier：识别并标记污点源（Source）
triggerAtFunctionCallBefore：检测污点汇点（Sink），生成报警
checkSinkAtFunctionCall：具体的 Sink 检测逻辑

实际上，每个回调里边可以做任何的操作，一般 source 发生在变量声明，sink 发生在函数调用

Checker 开发者需要了解的工具：

IntroduceTaint：污点标记工具类，用于标记 Source
SanitizerChecker：Sanitizer 检测工具，用于判断污点是否经过净化
checker-manager：负责在适当时机调用 checker 的回调方法
Rules：规则配置管理，用于读取 Source/Sink/Sanitizer 配置

Checker 开发者不需要关注的部分：

AST 解析和遍历（引擎自动完成）
符号执行和污点传播（引擎自动完成）
入口点执行流程（引擎自动完成）
性能追踪和报告生成（引擎自动完成）

1.3 编写 Checker 的 TodoList

开发一个新的 checker 需要完成以下步骤：

1.3.1 创建 Checker 文件

复制现有 checker 作为模板
- 找到与目标语言或框架最接近的 checker 文件作为模板
- 复制到合适的目录并重命名
修改类名和 checker ID
- 修改类名为新的 checker 名称
- 在 constructor 中修改 checkerId，确保在整个项目中唯一
定义污点标记名称
- 如果继承自抽象 checker，通常使用父类定义的污点标记
- 如果需要自定义，定义常量并在相关方法中使用

1.3.2 实现必要的回调方法

triggerAtFunctionDefinition：在预处理阶段收集函数定义
- 将函数加入 sourceScope，用于后续标记传播
triggerAtStartOfAnalyze：分析初始化
- 准备分析入口点（EntryPoints）
- 为 sourceScope 中的函数添加污点标记
- 为规则配置中的内容添加标记
triggerAtIdentifier：识别污点源（Source）
- 使用 IntroduceTaint.introduceTaintAtIdentifier 标记污点
triggerAtFunctionCallBefore：检测污点汇点（Sink）
- 检查函数调用参数中的污点源
- 调用 checkSinkAtFunctionCall 检测 Sink
checkSinkAtFunctionCall：具体的 Sink 检测逻辑
- 匹配规则配置中的 Sink
- 使用 SanitizerChecker 检查是否经过净化
- 生成报警（this.resultManager.newFinding）

1.3.3 配置 Checker

在 checker-config.json 中注册 checker
- 添加新的 checker 配置项，包含 checkerId、checkerPath、description 和可选的 demoRuleConfigPath
在 checker-pack-config.json 中创建或加入 checker pack
- 在现有 checker pack 的 checkerIds 中添加新的 checkerId
- 或创建新的 checker pack 配置项
在规则配置文件（rule_config_*.json）中添加规则配置
- 在 checkerIds 数组中包含新的 checker ID
- 配置 Source 规则（FuncCallReturnValueTaintSource、IdentifierTaintSource、FuncCallArgTaintSource）
- 配置 Sink 规则（FuncCallTaintSink）
- 配置 Sanitizer 规则

1.4 关键概念

1.4.1 EntryPoints（分析入口）

分析入口是符号执行的起始点，分为两类：

函数入口：从特定函数开始分析
文件入口：从文件顶层代码开始分析

引擎通过调用图分析、配置文件等方式确定分析入口。

1.4.2 Source、Sink 和 Sanitizer

Source（污点源）：用户输入等不可信数据来源，如 req.query、req.body 等
Sink（污点汇点）：危险操作点，如 execSync、eval、SQL 查询等
Sanitizer（净化函数）：对污点数据进行安全处理的函数，如 escape、参数化查询等

1.4.3 Function Closure (fclos)

函数闭包是引擎中表示函数的数据结构，主要包含：

fclos = FunctionValue({
   fdef: node,        // 函数定义节点
   sid: funcName,      // 函数名
   qid: targetQid,     // Qualified ID，唯一标识符
   parent: scope,      // 父作用域
   ast: node,          // 函数体 AST
   // ... 其他属性
})

1.4.4 RuleConfig（规则配置）

规则配置文件定义了 Source、Sink、Sanitizer 的识别规则，是 checker 工作的依据。详细配置说明请参考配置文件示例。

2 开发一个 Django 框架污点分析 Checker

2.1 思考与通用 Python 污点分析的区别

2.1.1 入口发现方式不同

Django 框架采用 URL 路由配置的方式，路由定义在 urls.py 文件中，而不是通过函数调用直接注册。通用 Python 污点分析使用文件入口和函数调用边界，可能无法发现 Django 的路由入口。

Django 的路由配置示例：

# urls.py
from django.urls import path
from . import views

urlpatterns = [
    path('articles/<int:article_id>/', views.article_detail),
    path('users/<str:username>/', views.user_profile),
]

2.1.2 Source 识别方式不同

Django 的 Source 主要包括：

路径参数：从 URL 路径中提取，如 <int:article_id> 中的 article_id
request 对象：Django 视图函数的第一个参数通常是 request，包含 request.GET、request.POST 等

2.1.3 视图类型多样

Django 支持两种视图类型：

函数视图：直接定义的函数
类视图：继承自 View 的类，通过 as_view() 方法转换为视图函数

2.2 编写测试用例

以下是一个简单的 Django 测试用例，用于验证路由发现和 Source 识别：

可以让大模型写，大模型写的很全面

# urls.py
from django.urls import path
from . import views

urlpatterns = [
    path('articles/<int:article_id>/', views.article_detail),
    path('users/<str:username>/', views.user_profile),
]

# views.py
from django.http import HttpResponse
from django.shortcuts import render
import subprocess

def article_detail(request, article_id):
    # 漏洞1: 路径参数直接用于命令执行
    subprocess.call(['cat', f'/articles/{article_id}'])  # article_id 是 Source
    return HttpResponse(f'Article {article_id}')

def user_profile(request, username):
    # 漏洞2: 路径参数直接用于 SQL 查询
    query = f"SELECT * FROM users WHERE username='{username}'"  # username 是 Source
    # ... 执行 SQL
    return HttpResponse(f'User {username}')

# 类视图示例
from django.views import View

class ArticleView(View):
    def get(self, request, article_id):
        # 漏洞3: 类视图中的路径参数
        subprocess.call(['ls', f'/articles/{article_id}'])  # article_id 是 Source
        return HttpResponse('OK')

关键点：

需要从 urls.py 中发现路由配置
需要从路由路径中提取路径参数（如 <int:article_id> 中的 article_id）
需要识别视图函数和类视图
需要将路径参数和 request 参数标记为 Source

2.3 实现思路

开发 Django 专用的污点分析 checker，我们需要解决几个框架特有的核心问题：

2.3.1 路由发现

问题： 如何从 urls.py 中发现路由配置？

解决方案：

在 triggerAtCompileUnit 中识别 urls.py 文件
检查是否导入了 Django 的 URL 配置模块（django.urls 或 django.conf.urls）
在 triggerAtAssignment 中识别 urlpatterns 变量的赋值

// src/checker/taint/python/django-taint-checker.ts
triggerAtCompileUnit(analyzer: any, scope: any, node: any, state: any, info: any) {
  const fileName = node.loc?.sourcefile
  if (!fileName) return
  if (!fileName.endsWith('/urls.py')) return
  
  // 检查是否导入了 Django URL 模块
  node.body.forEach((exp: any) => {
    if (exp.type === 'VariableDeclaration') {
      if (exp.init.type !== 'ImportExpression') return
      const str = AstUtil.prettyPrint(exp)
      if (str.includes('django') && str.includes('urls') && (str.includes('re_path') || str.includes('path'))) {
        registerFile.add(fileName)
      } else if (str.includes('django') && str.includes('conf') && str.includes('urls') && str.includes('url')) {
        registerFile.add(fileName)
      }
    }
  })
}

triggerAtAssignment(analyzer: any, scope: any, node: any, state: any, info: any) {
  const fileName = node.loc?.sourcefile
  if (!fileName) return
  if (registerFile.size === 0 || !registerFile.has(fileName)) {
    return
  }

  // 识别 urlpatterns 赋值
  if (node.left.name === 'urlpatterns') {
    const { right } = node
    this.collectDjangoEntrypointAndSource(analyzer, scope, state, right)
  }
}

2.3.2 提取路由配置

问题： 如何从 urlpatterns 中提取路由信息？

解决方案：

解析 path()、re_path()、url() 函数调用
从路由路径字符串中提取路径参数（如 <int:article_id>）
识别视图函数或类视图

collectDjangoEntrypointAndSource(analyzer: any, scope: any, state: any, value: any) {
  const elementGroups: any[] = []
  this.extractElementsFromNode(elementGroups, value)  // 处理列表和列表拼接
  
  for (const element of elementGroups) {
    if (element.type === 'CallExpression' && element.callee) {
      const { callee } = element
      // 处理 MemberAccess (如 django.urls.path) 和 Identifier (如直接导入的 path)
      let methodName: string | null = null
      if (callee.type === 'MemberAccess' && callee.property?.name) {
        methodName = callee.property.name
      } else if (callee.type === 'Identifier') {
        methodName = callee.name || null
      }
      if (methodName !== 'path' && methodName !== 're_path' && methodName !== 'url') {
        continue
      }
      // 获取 path 调用的参数
      if (element.arguments && element.arguments.length >= 2) {
        const targetSrcName = this.extractParamNames(element.arguments[0].value)
        const viewFunction = element.arguments[1]
        if (viewFunction.type === 'Identifier' || viewFunction.type === 'MemberAccess') {
          this.collectFuncViewEntrypointAndSource(analyzer, scope, state, viewFunction, targetSrcName)
        } else if (viewFunction.type === 'CallExpression' && viewFunction.callee) {
          if (viewFunction.callee.type === 'MemberAccess' && viewFunction.callee.property.name === 'as_view') {
            this.collectClassViewEntrypointAndSource(analyzer, scope, state, viewFunction, targetSrcName)
          }
        }
      }
    }
  }
}

2.3.3 提取路径参数

问题： 如何从路由路径中提取参数名？

解决方案： 使用正则表达式匹配 <type:param> 或 <param> 格式

extractParamNames(route: string): string[] {
  // 匹配 <type:param> 或 <param>
  const regex = /<(?:(?:\w+):)?(\w+)>/g
  const params: string[] = []
  let match: RegExpExecArray | null
  while ((match = regex.exec(route)) !== null) {
    params.push(match[1])  // 提取参数名
  }
  return params
}

2.3.4 添加 Source 标记

问题： 如何将路径参数和 request 参数标记为 Source？

解决方案：

在收集入口点时，将路径参数添加到 sourceScope
将视图函数的 request 参数也添加到 sourceScope

collectFuncViewEntrypointAndSource(analyzer: any, scope: any, state: any, 
                                   viewFunction: ASTObject, targetSrcName: string[]) {
  const ep = analyzer.processInstruction(scope, viewFunction, state)
  if (ep.vtype === 'fclos') {
    analyzer.entryPoints.push(completeEntryPoint(ep))
    
    // 添加路径参数为 Source
    if (targetSrcName.length > 0) {
      const targetName = targetSrcName[0]
      for (const param of ep.fdef.parameters) {
        if (param.id.name === targetName) {
          this.sourceScope.value.push({
            path: param.id.name,
            kind: 'PYTHON_INPUT',
            scopeFile: extractRelativePath(param?.loc?.sourcefile, Config.maindir),
            scopeFunc: ep.fdef?.id?.name,
            locStart: param.loc.start.line,
            locEnd: param.loc.end.line,
          })
        }
      }
    }
    
    // 添加 request 参数为 Source
    for (const param of ep.fdef.parameters) {
      if (param.id.name === 'request') {
        this.sourceScope.value.push({
          path: param.id.name,
          kind: 'PYTHON_INPUT',
          scopeFile: extractRelativePath(param?.loc?.sourcefile, Config.maindir),
          scopeFunc: ep.fdef?.id?.name,
          locStart: param.loc.start.line,
          locEnd: param.loc.end.line,
        })
      }
    }
  }
}

2.3.5 处理类视图

问题： 如何处理类视图（如 ArticleView.as_view()）？

解决方案：

提取类对象
找到类中的 HTTP 方法（get、post 等）
为每个方法创建入口点并添加 Source

collectClassViewEntrypointAndSource(analyzer: any, scope: any, state: any,
                                    viewFunction: ASTObject, targetSrcName: string[]) {
  // 提取类名
  const clsObj = viewFunction.callee.object
  const clsSymVal = analyzer.processInstruction(scope, clsObj, state)
  const httpMethods = new Set(['get', 'post', 'put', 'delete', 'patch', 'head', 'options'])
  const entrypoints = Object.entries(clsSymVal.value)
    .filter(([key, value]: [string, any]) => httpMethods.has(key) && value.vtype === 'fclos')
    .map(([, value]: [string, any]) => value)
  
  if (targetSrcName.length > 0) {
    const targetName = targetSrcName[0]
    for (const ep of entrypoints as any[]) {
      // 添加路径参数为 Source
      for (const param of ep.fdef.parameters) {
        if (param.id.name === targetName) {
          this.sourceScope.value.push({
            path: param.id.name,
            kind: 'PYTHON_INPUT',
            scopeFile: extractRelativePath(param?.loc?.sourcefile, Config.maindir),
            scopeFunc: ep.fdef?.id?.name,
            locStart: param.loc.start.line,
            locEnd: param.loc.end.line,
          })
        }
      }
      analyzer.entryPoints.push(completeEntryPoint(ep))
    }
  } else {
    for (const ep of entrypoints as any[]) {
      // 添加 request 参数为 Source
      for (const param of ep.fdef.parameters) {
        if (param.id.name === 'request') {
          this.sourceScope.value.push({
            path: param.id.name,
            kind: 'PYTHON_INPUT',
            scopeFile: extractRelativePath(param?.loc?.sourcefile, Config.maindir),
            scopeFunc: ep.fdef?.id?.name,
            locStart: param.loc.start.line,
            locEnd: param.loc.end.line,
          })
        }
      }
      analyzer.entryPoints.push(completeEntryPoint(ep))
    }
  }
}

2.4 一些实现细节

2.4.1 处理列表拼接

Django 的 urlpatterns 可能通过列表拼接组合：

urlpatterns = [
    path('articles/', views.article_list),
] + [
    path('users/', views.user_list),
]

需要递归处理 BinaryExpression：

extractElementsFromNode(elementGroups: any[], node: ASTObject | null): void {
  if (!node) return
  if (node.type === 'ObjectExpression' && node.properties) {
    elementGroups.push(...(node.properties.map((prop: any) => prop.value).filter(Boolean) as ASTObject[]))
  } else if (node.type === 'BinaryExpression') {
    // 处理 urlpatterns = [] + [...]
    this.extractElementsFromNode(elementGroups, node.left || null)
    this.extractElementsFromNode(elementGroups, node.right || null)
  }
}

2.4.2 继承 PythonTaintAbstractChecker

Django checker 继承自 PythonTaintAbstractChecker，复用 Python 污点分析的基础功能，只需要实现框架特定的路由发现和 Source 识别逻辑。

Django checker 与通用 Python checker 的区别：

入口点收集方式不同：
- 通用 Python checker：在 triggerAtStartOfAnalyze 中通过 findPythonFcEntryPointAndSource 收集入口点
- Django checker：在 triggerAtAssignment 中直接从 urlpatterns 收集入口点
Source 标记方式不同：
- 通用 Python checker：在 triggerAtStartOfAnalyze 中为 sourceScope 添加标记
- Django checker：在收集入口点时直接将路径参数和 request 参数添加到 sourceScope
继承的通用功能：
- triggerAtIdentifier：识别并标记污点源（继承自父类）
- triggerAtFunctionCallBefore：检测污点汇点（继承自父类）
- checkSinkAtFunctionCall：具体的 Sink 检测逻辑（继承自父类）

2.4.3 对应 1.3 todolist 总结都开发了什么

3 后续需要着重关注的问题

带着几个问题阅读代码：

路径遍历的机制
1. 一个程序点分析几次？爆炸图？迭代？单次？
2. 一个函数被不同调用如何分析？产生克隆？合并状态？
3. 需要仔细看看分支、循环、函数调用语句的处理
符号系统如何工作
1. 数值如何表示？
2. 状态如何表示？
状态合并的时机和机制
1. 程序点传播的状态是什么样的数据结构，如何注册？
2. 扩展机制是什么样的？merge 时机判定和操作如何实现？
3. source 如何加入，如何传播？sanitizer 如何生效？
ruleconfig的具体语法
1. 有多少种定义 source 的方式？
2. 有多少种定义 sink 的方式？
3. sanitizer 怎么配置？

欢迎关注【开放式安全基础设施】公众号，与上千名技术精英交流技术干货&程序分析

点击了解【开放式统一多语言程序分析产品YASA】