OCLint 入门到实战（中）：OCLint 工作流及源码解析OCLint是一个基于 clang tool 的静态代码分

关键词

oclint, workflow, AST

预计阅读时间

10-15 min

OCLint 是什么

OCLint是一个基于 clang tool 的静态代码分析工具，用于通过检查C，C ++和Objective-C代码并查找潜在的问题（例如可能的错误，未使用的代码，复杂的代码，冗余的代码，代码气味，不良做法等）来提高质量并减少缺陷。

目前的状态

OCLint距离完成还有很长的路要走，但在许多方面都在不断改进，例如准确性，性能和可用性。

OCLint作为休斯顿大学的一个研究项目而开始。从那时起，OCLint已被重写并成长为一个100％开放源代码项目。 OCLint旨在为学术界和工业界提供服务。目标是传播代码质量的重要性，并使OCLint成为使用C，C ++和Objective-C语言进行编程的开发人员的必备工具。

OCLint与macOS，主要的BSD和Linux变体兼容。移植到Windows是在社区的努力下进行的试验。

基本用法

使用oclint 、oclint-json-compilation-database和oclint-xcodebuild可以生成分析报告

oclint 命令的用法

用法: oclint [subcommand] [options] <source0> [... <sourceN>]

选项:

一般选项:

-help                         - 显示可用选项 (-help-hidden 显示更多)
-help-list                    - 显示可用选项列表 (-help-list-hidden 显示更多)
-version                      - 显示本程序版本

OCLint 选项:

-R=<directory>                - 添加目录作为规则加载路径
-allow-duplicated-violations  - 允许重复的违例在 OCLint 报告中
-disable-rule=<rule name>     - 禁用规则
-enable-clang-static-analyzer - 启用 Clang 静态分析器, 整合结果至 OCLint 报告
-enable-global-analysis       - 编译每个源，并跨全局上下文进行分析（取决于源文件的数量，可能导致高内存负载）
-extra-arg=<string>           - 添加到命令行尾部的额外的参数
-extra-arg-before=<string>    - 添加到命令行头部的额外的参数
-list-enabled-rules           - 列出已启用的规则
-max-priority-1=<threshold>   - 允许的优先级1违例的最大数量
-max-priority-2=<threshold>   - 允许的优先级2违例的最大数量
-max-priority-3=<threshold>   - 允许的优先级3违例的最大数量
-no-analytics                 - 禁用匿名分析
-o=<path>                     - 报告输出的指定路径
-p=<string>                   - 编译路径
-rc=<parameter>=<value>       - 重写规则的默认行为
-report-type=<name>           - 修改输出报告的类型
-rule=<rule name>             - 明确地选择规则

-p <build-path> 用来读取编译命令数据库

oclint-json-compilation-database 命令的用法

使用方法: oclint-json-compilation-database [-h] [-v] [-debug] [-i INCLUDES]
                                           [-e EXCLUDES]
                                           [oclint_args [oclint_args ...]]
   
   OCLint for JSON Compilation Database (compile_commands.json)
   
   定位参数(必选):
     oclint_args           参数会被传递至 OCLint 调用
   
   可选参数:
     -h, --help            显示此帮助消息并退出
     -v                    显示带有参数的调用命令
     -debug, --debug       调用 OCLint 在 debug 模式
     -i INCLUDES, -include INCLUDES, --include INCLUDES 提取匹配的文件
     -e EXCLUDES, -exclude EXCLUDES, --exclude EXCLUDES 删除匹配的文件
     -p build-path         指定包含 compile_commands.json 的目录

oclint-xcodebuild 通过在项目根文件夹中运行oclint-xcodebuild，应生成compile_command.json文件。对于使用Xcode的程序员来说带来极大的方便，它从Xcode编译日志中提取编译参数生成compile_commands.json。（另外xcpretty可以直接生成JSONCompilationDatabase.json）

使用规则

前文我们已经说到，oclint 的规则是以 dylib 形式存在的，如果现有的规则不满足业务需求，开发人员就可以编写自己的规则并编译成 dylib 。
oclint 在安装时默认给我们提供了 71 个规则，这些规则在安装目录下的 lib/oclint/rules 目录中。有些规则可以设置阈值，比如行数的检测（可以自定义超过多少行才算违例）。
那么如何设置阈值呢？

举个栗子：

将行数降低为50，可以给出以下命令

-rc LONG_LINE=50

这里提供全部可设置的参数表（oclint 20.11 版本）

源码结构

oclint/driver 中 main.cpp 作为程序的入口。该文件的精简后的代码框架如下所示

int main(int argc, const char **argv)
{
    llvm::cl::SetVersionPrinter(oclintVersionPrinter);
    // 构造 parser 分析程序
    CommonOptionsParser optionsParser(argc, argv, OCLintOptionCategory);
    // 配置
    oclint::option::process(argv[0]);
    
    ...

// 构造 analyzer
    oclint::RulesetBasedAnalyzer analyzer(oclint::option::rulesetFilter().filteredRules());
// 构造 driver
    oclint::Driver driver;

    // 执行分析
    driver.run(optionsParser.getCompilations(), optionsParser.getSourcePathList(), analyzer);
    
    std::unique_ptr<oclint::Results> results(std::move(getResults()));

    ostream *out = outStream();
    // 输出报告
    reporter()->report(results.get(), *out);
    disposeOutStream(out);

    return handleExit(results.get());
}

接着查看核心的 Driver 类的关键代码片段，有三个比较核心的方法

constructCompilers，invoke，run

// 构建编译器
static void constructCompilers(std::vector<oclint::CompilerInstance *> &compilers,
    CompileCommandPairs &compileCommands,
    std::string &mainExecutable)
{
    for (auto &compileCommand : compileCommands) // 遍历编译命令集
    {
        std::vector<std::string> adjustedCmdLine =
            adjustArguments(compileCommand.second.CommandLine, compileCommand.first);

#ifndef NDEBUG
        printCompileCommandDebugInfo(compileCommand, adjustedCmdLine);
#endif

        LOG_VERBOSE("Compiling ");
        LOG_VERBOSE(compileCommand.first.c_str());
	std::string targetDir = stringReplace(compileCommand.second.Directory, "\\ ", " ");

        if(chdir(targetDir.c_str()))
        {
            throw oclint::GenericException("Cannot change dictionary into \"" +
                targetDir + "\", "
                "please make sure the directory exists and you have permission to access!");
        }
        clang::CompilerInvocation *compilerInvocation =
            newCompilerInvocation(mainExecutable, adjustedCmdLine);// 创建 CompilerInvocation 对象
        oclint::CompilerInstance *compiler = newCompilerInstance(compilerInvocation);
// 使用 clang 的 CompilerInvocation 对象 创建 oclint 的 CompilerInstance 对象，oclint 做了封装
        compiler->start(); // clang::FrontendAction 核心是获取到 action 并执行
        if (!compiler->getDiagnostics().hasErrorOccurred() && compiler->hasASTContext())
        {
            LOG_VERBOSE(" - Success");
            compilers.push_back(compiler); // oclint 封装的 CompilerInstance 对象放入集合中
        }
        else
        {
            LOG_VERBOSE(" - Failed");
        }
        LOG_VERBOSE_LINE("");
    }
}

// 实际的进行分析的唤起方法
static void invoke(CompileCommandPairs &compileCommands,
    std::string &mainExecutable, oclint::Analyzer &analyzer)
{
    std::vector<oclint::CompilerInstance *> compilers; // 编译器容器
    constructCompilers(compilers, compileCommands, mainExecutable);  // 构建编译器

    // collect a collection of AST contexts
    std::vector<clang::ASTContext *> localContexts;
    for (auto compiler : compilers) // 遍历编译器集合
    {
        localContexts.push_back(&compiler->getASTContext()); // 将 AST 上下文放入 上下文集合
    }

    // use the analyzer to do the actual analysis
    analyzer.preprocess(localContexts); // 将上下文集合送入分析器 预处理
    analyzer.analyze(localContexts);
    analyzer.postprocess(localContexts);

    // send out the signals to release or simply leak resources
    for (size_t compilerIndex = 0; compilerIndex != compilers.size(); ++compilerIndex)
    {
        compilers.at(compilerIndex)->end();
        delete compilers.at(compilerIndex);
    }
}
// main.cpp 调用的核心方法，执行分析
void Driver::run(const clang::tooling::CompilationDatabase &compilationDatabase,
    llvm::ArrayRef<std::string> sourcePaths, oclint::Analyzer &analyzer)
{
    CompileCommandPairs compileCommands; // 生成编译指令对容器
    constructCompileCommands(compileCommands, compilationDatabase, sourcePaths); // 构造编译指令对

    static int staticSymbol; // 静态符号
    std::string mainExecutable = llvm::sys::fs::getMainExecutable("oclint", &staticSymbol);// 获取 oclint 可执行程序的路径

    if (option::enableGlobalAnalysis())
    {
        invoke(compileCommands, mainExecutable, analyzer);
    }
    else
    {
        for (auto &compileCommand : compileCommands)
        {
            CompileCommandPairs oneCompileCommand { compileCommand };
            invoke(oneCompileCommand, mainExecutable, analyzer);
        }
    }

    if (option::enableClangChecker())
    {
        invokeClangStaticAnalyzer(compileCommands, mainExecutable);
    }
}

最后一个就是 RulesetBasedAnalyzer 类，这个类的代码量非常少，如下所示

void RulesetBasedAnalyzer::analyze(std::vector<clang::ASTContext *> &contexts)
{
    for (const auto& context : contexts)
    {
        LOG_VERBOSE("Analyzing ");
        auto violationSet = new ViolationSet();
        auto carrier = new RuleCarrier(context, violationSet); // 规则运载者，context 是传递给规则来分析的数据，violationSet 是用于存放处理好的结果集
        LOG_VERBOSE(carrier->getMainFilePath().c_str());
        for (RuleBase *rule : _filteredRules) // 遍历已经过滤的规则集合
        {
            rule->takeoff(carrier); // 调用规则的 takeoff
        }
        ResultCollector *results = ResultCollector::getInstance(); // 取得结果收集器实例
        results->add(violationSet); // 将规则处理好的数据加入收集器
        LOG_VERBOSE_LINE(" - Done");
    }
}

从上面的代码可以看出 analyzer 会遍历规则集合，来调用 rule 的 takeoff 方法。rule 的基类是 RuleBase，这个基类含有一个 RuleCarrier 的示例作为成员，RuleCarrier包含了每个文件对应的 ASTContext 和 violationSet，violationSet 用来存放违例的相关信息。 rule 的职责就是，检查其成员变量 ruleCarrier 的 ASTContext，有违例的情况，就将结果写入 ruleCarrier 的 violationSet 中。

那么，经过对源码的简要阅读和分析，我们可以得到如下的 oclint 的核心类的调用关系图。

高级：自定义规则

在上文中，我们已经了解到 oclint 的基本用法，以及工作流程。

接下来更灵活也是有更高的使用难度的部分--自定义规则。

规则必须实现RuleBase类或其派生的抽象类。不同的规则专用于不同的抽象级别，例如，某些规则可能必须非常深入地研究代码的控制流，相反，某些规则仅通过读取源代码的字符串来检测缺陷。

oclint 提供了三个抽象类，以便我们来编写自定义规则。 AbstractSourceCodeReaderRule（源代码读取器规则），AbstractASTVisitorRule（AST访问者规则），以及 AbstractASTMatcherRule（AST匹配器规则）。

按照官方文档的说法，除非性能是个大问题，由于 AST匹配器规则 具有良好的可读性，我们可能大多数时候都会选择编写AST匹配器规则。

AST访问者规则是基于访问者模式，你只需要重载某些方法（该抽象类提供了一系列节点被访问的接口），即可处理相应节点内的校验逻辑。（由于 OCLint 使用的是Clang生成的抽象语法树，因此了解Clang AST的API在编写规则时非常有帮助相关链接）。

AST匹配器规则是基于匹配模式，你需要构造一些匹配器并加载。只要找到匹配项，callback就以该AST节点作为参数调用method，你就可以在 callback 中收集违例信息。（关于匹配器的更多信息看这里）

这里简单就说这么多，我们只需要知道 oclint 提供了抽象类，用于实现自定义规则。具体的代码部分会在下一节展开。

下面该做什么

到目前为止，我们已经知道了，OCLint 都提供了怎样的能力，以及如何使用。接下来我们将会详细的讲解关于自定义规则如何编写以及调试。这样，在内置规则和自定义规则的加持下，应该就可以覆盖到静态检查的全部场景了。

点击进入下一章 OCLint 入门到实战（下）：自定义规则

参考链接