03_apollo_scripts子模块整体软件架构深入分析文档

12 阅读4分钟

1. 概述

  apollo_scripts是Apollo自动驾驶平台的脚本管理模块,负责自动化构建、部署、运行和测试等功能。该模块包含了一系列shell脚本和Python工具,提供了环境配置、模块启动、设备初始化、代码质量检查等关键功能,是整个Apollo系统的重要支撑组件。这些脚本涵盖了从开发到部署的全生命周期管理,包括构建、测试、运行和维护等各个方面。

2. 软件架构图

graph TB
    subgraph "用户接口层"
        A1[命令行接口 CLI]
        A2[Docker容器接口]
        A3[IDE集成接口]
        A4[Web管理界面接口]
    end
    
    subgraph "脚本管理层"
        A[Apollo Scripts Manager]
        
        subgraph "构建脚本组"
            B[Build Scripts]
            B1[apollo_buildify.sh - 代码格式化]
            B2[apollo_action.sh - 构建动作管理]
            B3[apollo_clean.sh - 清理构建产物]
            B4[apollo_format.sh - 代码格式检查]
            B5[buildifier.sh - BUILD文件格式化]
            B6[clang_format.sh - C++代码格式化]
            B7[yapf.sh - Python代码格式化]
            B8[apollo_lint.sh - 代码规范检查]
            B9[apollo_ci.sh - 持续集成]
        end
        
        subgraph "运行时脚本组"
            C[Runtime Scripts]
            C1[env.sh - 环境初始化]
            C2[apollo_base.sh - 基础函数库]
            C3[bootstrap.sh - 系统引导]
            C4[cyber_launch.sh - Cyber组件启动]
            C5[模块启动脚本群组]
            C51[dreamview.sh - 可视化界面]
            C52[perception.sh - 感知模块]
            C53[localization.sh - 定位模块]
            C54[planning.sh - 规划模块]
            C55[control.sh - 控制模块]
            C56[canbus.sh - CAN总线通信]
            C57[routing.sh - 路由模块]
            C58[prediction.sh - 预测模块]
        end
        
        subgraph "测试脚本组"
            D[Test Scripts]
            D1[replay.sh - 数据回放]
            D2[record_bag.sh - 数据记录]
            D3[performance_test.sh - 性能测试]
            D4[unit_test_runner.sh - 单元测试]
            D5[integration_test.sh - 集成测试]
            D6[functional_test.sh - 功能测试]
        end
        
        subgraph "工具脚本组"
            E[Utility Scripts]
            E1[device_setup.sh - 硬件设备配置]
            E2[data_management.sh - 数据管理]
            E3[configuration_tools.sh - 配置管理]
            E4[model_download.sh - 模型下载]
            E5[map_tools.sh - 地图工具]
            E6[common_functions.sh - 通用函数库]
            E7[log_analyzer.sh - 日志分析]
            E8[diagnostic_tool.sh - 诊断工具]
        end
        
        subgraph "配置管理组"
            F[Configuration Scripts]
            F1[apollo_config.sh - 系统配置]
            F2[switch_vehicle.sh - 车辆切换]
            F3[install_scripts.sh - 安装脚本]
            F4[environment_setup.sh - 环境设置]
            F5[vehicle_calibrations.sh - 车辆标定]
            F6[security_config.sh - 安全配置]
        end
        
        subgraph "运维脚本组"
            G[Maintenance Scripts]
            G1[monitor.sh - 系统监控]
            G2[log_management.sh - 日志管理]
            G3[data_cleaner.sh - 数据清理]
            G4[ota.sh - 在线更新]
            G5[health_check.sh - 健康检查]
            G6[backup_restore.sh - 备份恢复]
        end
        
        subgraph "部署脚本组"
            H[Deployment Scripts]
            H1[apollo_deploy.sh - 部署脚本]
            H2[remote_deploy.sh - 远程部署]
            H3[package_builder.sh - 包构建]
            H4[image_creator.sh - 镜像创建]
            H5[container_manager.sh - 容器管理]
        end
    end
    
    subgraph "底层支撑层"
        J[操作系统层 - Linux Ubuntu]
        K[Docker容器引擎]
        L[Bazel构建系统]
        M[Python运行时环境]
        N[Bash Shell环境]
        O[Protobuf编译器]
        P[CMake构建工具]
    end
    
    subgraph "外部依赖"
        Q[硬件驱动程序]
        R[传感器设备]
        S[网络服务]
        T[云服务平台]
    end
    
    A1 --> A
    A2 --> A
    A3 --> A
    A4 --> A
    
    A --> B
    A --> C
    A --> D
    A --> E
    A --> F
    A --> G
    A --> H
    
    B --> L
    B --> P
    B --> O
    
    C --> N
    C --> M
    C --> K
    
    D --> K
    D --> J
    
    E --> J
    E --> Q
    
    F --> J
    F --> S
    
    G --> J
    G --> T
    
    H --> K
    H --> S
    H --> T
    
    J -.-> A
    K -.-> A
    L -.-> A
    M -.-> A
    N -.-> A
    O -.-> A
    P -.-> A
    
    Q -.-> E
    R -.-> E
    S -.-> F
    T -.-> H

3. 调用流程图

flowchart TD
    Start([用户启动Apollo]) --> PreCheck{检查运行环境}
    PreCheck -->|环境正常| InitEnv[初始化环境变量]
    PreCheck -->|环境异常| ErrorEnv[报告环境错误]
    ErrorEnv --> End([结束])
    
    InitEnv --> LoadConfig[加载系统配置]
    LoadConfig --> CheckDocker{检查Docker环境}
    CheckDocker -->|在Docker内| InDockerOps[容器内操作]
    CheckDocker -->|在Docker外| OutDockerOps[容器外操作]
    
    InDockerOps --> DetectArch[检测系统架构]
    OutDockerOps --> DetectArch
    
    DetectArch -->|x86_64| SetupX86[配置x86_64环境]
    DetectArch -->|aarch64| SetupARM[配置ARM环境]
    
    SetupX86 --> DeviceSetup[初始化硬件设备]
    SetupARM --> DeviceSetup
    
    DeviceSetup --> CreateDirs[创建必要目录结构]
    CreateDirs --> SetupDevices[配置CAN/GPU设备]
    SetupDevices --> VerifySetup{验证设备配置}
    VerifySetup -->|配置成功| SystemReady[系统就绪]
    VerifySetup -->|配置失败| RetrySetup[重试配置]
    RetrySetup --> VerifySetup
    
    SystemReady --> WaitForCmd{等待用户命令}
    WaitForCmd -->|启动模块| StartModule[启动指定模块]
    WaitForCmd -->|停止模块| StopModule[停止指定模块]
    WaitForCmd -->|构建系统| BuildSystem[构建Apollo系统]
    WaitForCmd -->|运行测试| RunTests[运行测试套件]
    WaitForCmd -->|部署系统| DeploySystem[部署到目标]
    WaitForCmd -->|监控系统| MonitorSystem[监控系统状态]
    WaitForCmd -->|数据记录| RecordData[记录数据]
    WaitForCmd -->|清理数据| CleanData[清理历史数据]
    
    %% 模块启动流程
    StartModule --> ParseModule[解析模块参数]
    ParseModule --> CheckModule{检查模块状态}
    CheckModule -->|模块未运行| LaunchModule[启动模块]
    CheckModule -->|模块已运行| NotifyRunning[通知模块已在运行]
    NotifyRunning --> ReturnReady[返回系统就绪]
    
    LaunchModule --> FindLaunchFile[查找启动配置文件]
    FindLaunchFile --> ExecuteLaunch[执行cyber_launch启动]
    ExecuteLaunch --> VerifyLaunch{验证启动状态}
    VerifyLaunch -->|启动成功| LogSuccess[记录成功日志]
    VerifyLaunch -->|启动失败| LogFailure[记录失败日志]
    LogSuccess --> ReturnReady
    LogFailure --> ReturnReady
    
    %% 模块停止流程
    StopModule --> IdentifyProcess[识别模块进程]
    IdentifyProcess --> KillProcess[终止模块进程]
    KillProcess --> VerifyStop{验证停止状态}
    VerifyStop -->|已停止| LogStop[记录停止日志]
    VerifyStop -->|未停止| ForceKill[强制终止]
    ForceKill --> VerifyStop
    LogStop --> ReturnReady
    
    %% 构建系统流程
    BuildSystem --> ParseBuildArgs[解析构建参数]
    ParseBuildArgs --> CheckBuildEnv{检查构建环境}
    CheckBuildEnv -->|环境正常| CleanBuild[清理构建缓存]
    CheckBuildEnv -->|环境异常| SetupBuildEnv[设置构建环境]
    SetupBuildEnv --> CleanBuild
    
    CleanBuild --> DetermineTargets[确定构建目标]
    DetermineTargets --> ExecuteBuild[执行Bazel构建]
    ExecuteBuild --> VerifyBuild{验证构建结果}
    VerifyBuild -->|构建成功| PostBuild[构建后处理]
    VerifyBuild -->|构建失败| ReportBuildErr[报告构建错误]
    PostBuild --> ReturnReady
    ReportBuildErr --> ReturnReady
    
    %% 测试流程
    RunTests --> SetupTestEnv[设置测试环境]
    SetupTestEnv --> RunUnitTest[运行单元测试]
    RunUnitTest --> RunIntegrationTest[运行集成测试]
    RunIntegrationTest --> RunFunctionalTest[运行功能测试]
    RunFunctionalTest --> GenTestReport[生成测试报告]
    GenTestReport --> ReturnReady
    
    %% 部署流程
    DeploySystem --> ValidateTarget[验证部署目标]
    ValidateTarget --> PreparePackage[准备部署包]
    PreparePackage --> TransferPackage[传输部署包]
    TransferPackage --> InstallPackage[安装部署包]
    InstallPackage --> ConfigureDeploy[配置部署环境]
    ConfigureDeploy --> VerifyDeploy{验证部署结果}
    VerifyDeploy -->|部署成功| LogDeploy[记录部署成功]
    VerifyDeploy -->|部署失败| Rollback[回滚部署]
    LogDeploy --> ReturnReady
    Rollback --> ReturnReady
    
    %% 监控流程
    MonitorSystem --> CollectMetrics[收集系统指标]
    CollectMetrics --> AnalyzeData[分析指标数据]
    AnalyzeData --> CheckThresholds{检查阈值}
    CheckThresholds -->|正常| ContinueMonitor[继续监控]
    CheckThresholds -->|异常| RaiseAlert[发出警报]
    ContinueMonitor --> CollectMetrics
    RaiseAlert --> NotifyAdmin[通知管理员]
    NotifyAdmin --> ContinueMonitor
    
    %% 数据记录流程
    RecordData --> DecideStorage[决定存储位置]
    DecideStorage --> CreateTaskDir[创建任务目录]
    CreateTaskDir --> SelectChannels[选择记录通道]
    SelectChannels --> StartRecording[开始记录数据]
    StartRecording --> MonitorDisk{监控磁盘空间}
    MonitorDisk -->|空间充足| ContinueRecord[继续记录]
    MonitorDisk -->|空间不足| StopRecord[停止记录]
    ContinueRecord --> MonitorDisk
    StopRecord --> CompressData[压缩数据]
    CompressData --> ReturnReady
    
    %% 数据清理流程
    CleanData --> ScanOldData[扫描旧数据]
    ScanOldData --> FilterData[筛选待清理数据]
    FilterData --> ConfirmClean[确认清理操作]
    ConfirmClean --> ExecuteClean[执行清理]
    ExecuteClean --> VerifyClean{验证清理结果}
    VerifyClean -->|清理成功| LogClean[记录清理日志]
    VerifyClean -->|清理失败| ReportCleanErr[报告清理错误]
    LogClean --> ReturnReady
    ReportCleanErr --> ReturnReady
    
    ReturnReady --> WaitForCmd
    
    subgraph "核心流程"
        CoreFlow[SystemReady]
    end
    
    subgraph "错误处理流程"
        ErrFlow[Error Handling]
        ErrFlow --> LogError[记录错误]
        ErrFlow --> AttemptRecovery[尝试恢复]
        ErrFlow --> NotifyUser[通知用户]
    end
    
    LogFailure -.-> ErrFlow
    ReportBuildErr -.-> ErrFlow
    ReportCleanErr -.-> ErrFlow

4. UML类图

classDiagram
    %% 基础抽象层
    class ApolloScriptBase {
        <<abstract>>
        +String TOP_DIR
        +String APOLLO_ROOT_DIR
        +String ARCH
        +Boolean APOLLO_IN_DOCKER
        +int APOLLO_OUTSIDE_DOCKER
        +String CMDLINE_OPTIONS
        +Boolean ENABLE_PROFILER
        +String APOLLO_BIN_PREFIX
        +Map env_vars
        +
        +initialize_environment()
        +set_lib_path()
        +create_data_dir()
        +determine_bin_prefix()
        +setup_device()
        +decide_task_dir()
        +check_in_docker() 
        +pathprepend(String var, String value)
        +pathappend(String var, String value)
        +info(String msg)
        +warning(String msg)
        +error(String msg)
        +ok(String msg)
        +fatal(String msg)
        +check_function_exists(String func_name)
        +is_stopped_customized_path(String module_path, String module)
    }
    
    %% 构建系统层
    class BuildSystem {
        +String DISABLED_TARGETS
        +String SHORTHAND_TARGETS
        +int USE_GPU
        +Boolean USE_ESD_CAN
        +int USE_OPT
        +String BUILD_TYPE
        +
        +determine_build_targets(String... components)
        +determine_disabled_targets(String... components)
        +_chk_n_set_gpu_arg(String arg)
        +_determine_perception_disabled()
        +build(String... targets)
        +clean()
        +verify_build()
        +setup_build_environment()
        +configure_build_options()
    }
    
    class BuildOptimizer {
        +int MAX_JOBS
        +String BUILD_CACHE_DIR
        +Boolean USE_INCREMENTAL_BUILD
        +
        +optimize_build_performance()
        +enable_cache_mechanism()
        +limit_concurrent_jobs()
    }
    
    %% 模块管理层
    class ModuleLauncher {
        +String LAUNCH_FILE_PATH
        +String MODULE_STATUS
        +
        +start(String module, String... args)
        +start_customized_path(String module_path, String module, String... args)
        +stop(String module)
        +check_module_status(String module)
        +wait_for_exit(String module)
        +list_running_modules()
    }
    
    class ModuleRegistry {
        +Map registered_modules
        +List essential_modules
        +
        +register_module(ModuleInfo info)
        +unregister_module(String module_name)
        +get_module_info(String module_name)
        +get_essential_modules()
        +validate_module_dependencies()
    }
    
    class ModuleInfo {
        +String name
        +String path
        +String launch_file
        +List dependencies
        +Boolean is_essential
        +String description
        +
        +ModuleInfo(String name, String path, String launch_file)
        +getName()
        +getPath()
        +getLaunchFile()
        +getDependencies()
    }
    
    %% 设备管理层
    class DeviceSetup {
        +String CAN_DEVICE_PATTERN
        +String GPU_DEVICE_PATTERN
        +int NUM_CAN_PORTS
        +
        +setup_device_for_amd64()
        +setup_device_for_aarch64()
        +setup_can_devices()
        +check_gpu_devices()
        +setup_shared_mem()
        +initialize_hardware()
        +validate_device_access()
    }
    
    class HardwareValidator {
        +List required_devices
        +Map device_paths
        +
        +validate_required_hardware()
        +check_device_permissions()
        +test_device_functionality()
        +generate_hardware_report()
    }
    
    %% 配置管理层
    class ConfigManager {
        +String VEHICLE_NAME
        +String BRIDGE_PORT
        +String DASHBOARD_PORT
        +String CONFIG_DIR
        +
        +load_config()
        +validate_config()
        +apply_config()
        +save_config()
        +switch_vehicle(String vehicle_id)
        +validate_vehicle_config(String vehicle_id)
    }
    
    class VehicleConfig {
        +String vehicle_id
        +String model
        +String calibration_file
        +Map parameters
        +
        +VehicleConfig(String vehicle_id)
        +getCalibrationFile()
        +getParameter(String key)
        +setParameter(String key, Object value)
        +validate()
    }
    
    %% 数据管理层
    class DataManager {
        +String BAG_PATH
        +String LOG_PATH
        +String TASK_DIR
        +String DATA_RETENTION_DAYS
        +
        +manage_logs()
        +clean_data()
        +backup_data()
        +record_bag(List channels)
        +stop_record()
        +rotate_logs()
        +compress_old_data()
    }
    
    class DataRecorder {
        +String RECORDING_TASK_ID
        +String CURRENT_BAG_FILE
        +Boolean is_recording
        +
        +start_recording(List channels)
        +stop_recording()
        +pause_recording()
        +resume_recording()
        +get_recording_status()
    }
    
    %% 测试管理层
    class TestRunner {
        +String TEST_FILTER
        +String TEST_TIMEOUT
        +String TEST_REPORT_DIR
        +
        +run_unit_tests()
        +run_integration_tests()
        +run_functional_tests()
        +generate_test_report()
        +analyze_coverage()
        +validate_test_results()
    }
    
    class TestCaseManager {
        +List test_cases
        +TestResultAggregator aggregator
        +
        +add_test_case(TestCase tc)
        +run_all_tests()
        +get_test_results()
        +generate_coverage_report()
    }
    
    class TestCase {
        +String name
        +String description
        +String command
        +int timeout
        +
        +TestCase(String name, String command)
        +execute()
        +getName()
        +getTimeout()
    }
    
    %% 部署管理层
    class DeploymentManager {
        +String TARGET_HOST
        +String DEPLOY_PATH
        +String DEPLOY_PACKAGE
        +
        +deploy_to_remote()
        +rollback_version()
        +verify_deployment()
        +update_config()
        +check_target_compatibility()
    }
    
    class PackageBuilder {
        +String PACKAGE_FORMAT
        +List components
        +String OUTPUT_DIR
        +
        +create_package(List components)
        +extract_package(String path)
        +verify_package_integrity()
        +install_package(String package_path)
        +calculate_checksum(String file_path)
    }
    
    %% 监控系统层
    class MonitorSystem {
        +String METRICS_INTERVAL
        +Map system_metrics
        +List alert_handlers
        +
        +collect_cpu_usage()
        +collect_memory_usage()
        +collect_disk_usage()
        +collect_network_stats()
        +send_alert(String message)
        +log_event(String event)
        +start_monitoring()
        +stop_monitoring()
    }
    
    class MetricsCollector {
        +SystemMetrics current_metrics
        +List sources
        +
        +collect_system_metrics()
        +collect_process_metrics()
        +collect_network_metrics()
        +aggregate_metrics()
    }
    
    class AlertHandler {
        +String handler_type
        +String destination
        +
        +handle_alert(Alert alert)
        +send_notification(String message)
        +log_alert(Alert alert)
    }
    
    %% 执行管理层
    class ScriptExecutor {
        +String current_command
        +ExecutionResult last_result
        +
        +execute_command(String cmd)
        +handle_error(Error error)
        +log_operation(String operation)
        +validate_execution_env()
    }
    
    class ExecutionResult {
        +int exit_code
        +String stdout
        +String stderr
        +long execution_time
        +
        +ExecutionResult(int code, String out, String err)
        +isSuccessful()
        +getExitCode()
        +getStdout()
        +getStderr()
    }
    
    %% 主控制器
    class MainController {
        +BuildSystem build_system
        +ModuleLauncher module_launcher
        +ConfigManager config_manager
        +DataManager data_manager
        +TestRunner test_runner
        +DeploymentManager deployment_manager
        +MonitorSystem monitor_system
        +PackageBuilder package_builder
        +
        +initialize_system()
        +process_command(String[] args)
        +manage_lifecycle()
        +handle_shutdown()
    }
    
    %% 继承关系
    ApolloScriptBase <|-- BuildSystem
    ApolloScriptBase <|-- ModuleLauncher
    ApolloScriptBase <|-- DeviceSetup
    ApolloScriptBase <|-- ConfigManager
    ApolloScriptBase <|-- DataManager
    ApolloScriptBase <|-- TestRunner
    ApolloScriptBase <|-- DeploymentManager
    ApolloScriptBase <|-- MonitorSystem
    
    %% 关联关系
    BuildSystem --> BuildOptimizer : uses
    ModuleLauncher --> ModuleRegistry : manages
    ModuleRegistry --> ModuleInfo : contains
    DeviceSetup --> HardwareValidator : uses
    ConfigManager --> VehicleConfig : manages
    DataManager --> DataRecorder : uses
    TestRunner --> TestCaseManager : uses
    TestCaseManager --> TestCase : contains
    DeploymentManager --> PackageBuilder : uses
    MonitorSystem --> MetricsCollector : uses
    MonitorSystem --> AlertHandler : uses
    ScriptExecutor --> ExecutionResult : creates
    
    %% 主控制器关联
    MainController --> BuildSystem : orchestrates
    MainController --> ModuleLauncher : orchestrates
    MainController --> DeviceSetup : orchestrates
    MainController --> ConfigManager : orchestrates
    MainController --> DataManager : orchestrates
    MainController --> TestRunner : orchestrates
    MainController --> DeploymentManager : orchestrates
    MainController --> MonitorSystem : orchestrates
    MainController --> PackageBuilder : orchestrates
    MainController --> ScriptExecutor : uses

5. 状态机

stateDiagram-v2
    [*] --> SystemInit : 启动脚本
    SystemInit --> EnvSetup : 初始化环境变量
    EnvSetup --> CheckDocker : 检查Docker环境
    CheckDocker -->|在容器内| InDockerState : 设置容器环境
    CheckDocker -->|在容器外| OutDockerState : 设置宿主环境
    InDockerState --> DetectPlatform : 检测平台架构
    OutDockerState --> DetectPlatform
    
    DetectPlatform -->|x86_64| SetupAMD64 : 配置x86_64环境
    DetectPlatform -->|aarch64| SetupARM64 : 配置ARM64环境
    SetupAMD64 --> DeviceInitialization : 初始化设备
    SetupARM64 --> DeviceInitialization
    
    DeviceInitialization --> CreateDataDirs : 创建数据目录
    CreateDataDirs --> SetupHardware : 配置硬件设备
    SetupHardware --> SystemReady : 系统就绪
    
    SystemReady --> WaitForCommand : 等待用户命令
    WaitForCommand -->|构建命令| BuildProcess : 开始构建
    WaitForCommand -->|启动模块| ModuleStart : 启动模块
    WaitForCommand -->|停止模块| ModuleStop : 停止模块
    WaitForCommand -->|运行测试| TestProcess : 运行测试
    WaitForCommand -->|部署命令| DeployProcess : 执行部署
    WaitForCommand -->|监控命令| MonitorProcess : 开始监控
    WaitForCommand -->|数据记录| RecordProcess : 开始记录
    WaitForCommand -->|清理命令| CleanProcess : 执行清理
    
    %% 构建过程状态
    state BuildProcess {
        [*] --> ParseArgs : 解析参数
        ParseArgs --> ValidateEnv : 验证环境
        ValidateEnv -->|环境有效| PrepareBuild : 准备构建
        ValidateEnv -->|环境无效| BuildError : 环境错误
        PrepareBuild --> SelectTargets : 选择构建目标
        SelectTargets --> ExecuteBazel : 执行Bazel构建
        ExecuteBazel -->|构建成功| PostBuild : 构建后处理
        ExecuteBazel -->|构建失败| BuildError : 构建错误
        PostBuild --> BuildComplete : 构建完成
        BuildError --> [*]
        BuildComplete --> [*]
    }
    
    %% 模块启动状态
    state ModuleStart {
        [*] --> ParseModuleArgs : 解析模块参数
        ParseModuleArgs --> CheckModuleStatus : 检查模块状态
        CheckModuleStatus -->|模块已运行| ModuleRunning : 模块已在运行
        CheckModuleStatus -->|模块未运行| LocateLaunchFile : 查找启动文件
        ModuleRunning --> [*]
        LocateLaunchFile --> LaunchViaCyber : 通过cyber_launch启动
        LaunchViaCyber --> WaitLaunchResult : 等待启动结果
        WaitLaunchResult -->|启动成功| VerifyModule : 验证模块状态
        WaitLaunchResult -->|启动失败| ModuleStartError : 启动错误
        VerifyModule -->|验证通过| ModuleStarted : 模块启动成功
        VerifyModule -->|验证失败| ModuleStartError : 验证失败
        ModuleStartError --> [*]
        ModuleStarted --> [*]
    }
    
    %% 模块停止状态
    state ModuleStop {
        [*] --> IdentifyModule : 识别模块
        IdentifyModule --> FindProcess : 查找进程
        FindProcess -->|找到进程| KillProcess : 终止进程
        FindProcess -->|未找到进程| ModuleNotRunning : 模块未运行
        KillProcess --> VerifyStop : 验证停止状态
        VerifyStop -->|已停止| ModuleStopped : 模块已停止
        VerifyStop -->|未停止| ForceKill : 强制终止
        ForceKill --> VerifyStop
        ModuleNotRunning --> [*]
        ModuleStopped --> [*]
    }
    
    %% 测试过程状态
    state TestProcess {
        [*] --> SetupTestEnv : 设置测试环境
        SetupTestEnv --> RunUnitTests : 运行单元测试
        RunUnitTests -->|通过| RunIntegrationTests : 运行集成测试
        RunUnitTests -->|失败| TestsFailed : 测试失败
        RunIntegrationTests -->|通过| RunFunctionalTests : 运行功能测试
        RunIntegrationTests -->|失败| TestsFailed : 测试失败
        RunFunctionalTests -->|通过| GenerateReports : 生成报告
        RunFunctionalTests -->|失败| TestsFailed : 测试失败
        GenerateReports --> TestsComplete : 测试完成
        TestsFailed --> [*]
        TestsComplete --> [*]
    }
    
    %% 部署过程状态
    state DeployProcess {
        [*] --> ValidateTarget : 验证部署目标
        ValidateTarget -->|有效| PreparePackage : 准备部署包
        ValidateTarget -->|无效| DeployError : 部署目标错误
        PreparePackage --> TransferPackage : 传输部署包
        TransferPackage -->|成功| InstallPackage : 安装部署包
        TransferPackage -->|失败| DeployError : 传输失败
        InstallPackage -->|成功| ConfigureSystem : 配置系统
        InstallPackage -->|失败| DeployError : 安装失败
        ConfigureSystem -->|成功| VerifyDeploy : 验证部署
        ConfigureSystem -->|失败| DeployError : 配置失败
        VerifyDeploy -->|成功| DeploySuccess : 部署成功
        VerifyDeploy -->|失败| DeployError : 验证失败
        DeployError --> [*]
        DeploySuccess --> [*]
    }
    
    %% 监控过程状态
    state MonitorProcess {
        [*] --> InitializeMonitors : 初始化监控器
        InitializeMonitors --> CollectMetrics : 收集指标
        CollectMetrics --> AnalyzeData : 分析数据
        AnalyzeData --> CheckThresholds : 检查阈值
        CheckThresholds -->|正常| ContinueMonitor : 继续监控
        CheckThresholds -->|异常| TriggerAlert : 触发警报
        ContinueMonitor --> CollectMetrics : 循环收集
        TriggerAlert --> NotifyAdmin : 通知管理员
        NotifyAdmin --> ContinueMonitor
    }
    
    %% 记录过程状态
    state RecordProcess {
        [*] --> SelectChannels : 选择记录通道
        SelectChannels --> CreateTaskDir : 创建任务目录
        CreateTaskDir --> StartBagRecord : 开始bag记录
        StartBagRecord --> MonitorDiskUsage : 监控磁盘使用
        MonitorDiskUsage -->|空间充足| ContinueRecord : 继续记录
        MonitorDiskUsage -->|空间不足| StopAndAlert : 停止并警报
        ContinueRecord --> MonitorDiskUsage : 循环监控
        StopAndAlert --> CompressData : 压缩数据
        CompressData --> RecordComplete : 记录完成
        RecordComplete --> [*]
    }
    
    %% 清理过程状态
    state CleanProcess {
        [*] --> ScanData : 扫描数据
        ScanData --> IdentifyOldData : 识别旧数据
        IdentifyOldData --> ConfirmClean : 确认清理
        ConfirmClean --> ExecuteClean : 执行清理
        ExecuteClean --> VerifyClean : 验证清理
        VerifyClean -->|成功| CleanComplete : 清理完成
        VerifyClean -->|失败| CleanError : 清理错误
        CleanError --> [*]
        CleanComplete --> [*]
    }
    
    %% 错误处理状态
    state ErrorHandling {
        [*] --> LogError : 记录错误
        LogError --> AssessSeverity : 评估严重性
        AssessSeverity -->|致命错误| SystemShutdown : 系统关闭
        AssessSeverity -->|一般错误| AttemptRecovery : 尝试恢复
        AttemptRecovery -->|恢复成功| ReturnToReady : 返回就绪
        AttemptRecovery -->|恢复失败| SystemShutdown : 系统关闭
        SystemShutdown --> [*]
        ReturnToReady --> [*]
    }
    
    %% 连接错误处理
    BuildProcess --> ErrorHandling : 构建错误
    ModuleStart --> ErrorHandling : 启动错误
    ModuleStop --> ErrorHandling : 停止错误
    TestProcess --> ErrorHandling : 测试错误
    DeployProcess --> ErrorHandling : 部署错误
    RecordProcess --> ErrorHandling : 记录错误
    CleanProcess --> ErrorHandling : 清理错误
    
    %% 返回系统就绪状态
    BuildComplete --> SystemReady
    ModuleStarted --> SystemReady
    ModuleStopped --> SystemReady
    TestsComplete --> SystemReady
    DeploySuccess --> SystemReady
    RecordComplete --> SystemReady
    CleanComplete --> SystemReady
    ReturnToReady --> SystemReady
    ModuleRunning --> SystemReady
    ModuleNotRunning --> SystemReady
    
    %% 系统关闭状态
    SystemReady --> SystemShutdown : 接收关闭信号
    ErrorHandling --> SystemShutdown : 系统错误关闭
    SystemShutdown --> [*]

6. 源码分析

6.1. 核心初始化脚本

6.1.1. apollo_base.sh 初始化流程

  apollo_base.sh是所有Apollo脚本的基础,它负责初始化环境变量和定义通用函数。

#!/usr/bin/env bash
TOP_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd -P)"
source ${TOP_DIR}/scripts/apollo.bashrc

ARCH="$(uname -m)"

APOLLO_OUTSIDE_DOCKER=0
CMDLINE_OPTIONS=
SHORTHAND_TARGETS=
DISABLED_TARGETS=

: ${CROSSTOOL_VERBOSE:=0}
: ${NVCC_VERBOSE:=0}
: ${HIPCC_VERBOSE:=0}

: ${USE_ESD_CAN:=false}
USE_GPU=-1

use_cpu=-1
use_gpu=-1
use_nvidia=-1
use_amd=-1

ENABLE_PROFILER=true

  初始化流程主要包括:

  1. 设置顶级目录路径
  2. 加载基础bash配置
  3. 检测系统架构
  4. 初始化各种标志和配置

6.1.2. 环境路径配置

function set_lib_path() {
  local CYBER_SETUP="${APOLLO_ROOT_DIR}/cyber/setup.bash"
  [ -e "${CYBER_SETUP}" ] && . "${CYBER_SETUP}"
  pathprepend ${APOLLO_ROOT_DIR}/modules/tools PYTHONPATH
  pathprepend ${APOLLO_ROOT_DIR}/modules/teleop/common PYTHONPATH
  pathprepend /apollo/modules/teleop/common/scripts
}

  该函数配置Python模块路径,确保模块能够正确导入。它首先检查CyberRT的环境配置文件是否存在,如果存在则加载该文件,然后将相关模块路径添加到PYTHONPATH中。

6.1.3. 数据目录创建

function create_data_dir() {
    local DATA_DIR="${APOLLO_ROOT_DIR}/data"
    mkdir -p "${DATA_DIR}/log"
    mkdir -p "${DATA_DIR}/bag"
    mkdir -p "${DATA_DIR}/core"
}

  创建必要的数据目录,包括日志、数据包和核心转储目录。

6.2. 设备初始化机制

6.2.1. 设备初始化流程

  根据系统架构类型,脚本会调用不同的设备初始化函数:

function setup_device() {
    if [ "$(uname -s)" != "Linux" ]; then
        info "Not on Linux, skip mapping devices."
        return
    fi
    if [[ "${ARCH}" == "x86_64" ]]; then
        setup_device_for_amd64
    else
        setup_device_for_aarch64
    fi
}

6.2.2. x86_64 架构设备初始化

function setup_device_for_amd64() {
    # setup CAN device
    local NUM_PORTS=8
    for i in $(seq 0 $((${NUM_PORTS} - 1))); do
        if [[ -e /dev/can${i} ]]; then
            continue
        elif [[ -e /dev/zynq_can${i} ]]; then
            # soft link if sensorbox exist
            sudo ln -s /dev/zynq_can${i} /dev/can${i}
        else
            break
            # sudo mknod --mode=a+rw /dev/can${i} c 52 ${i}
        fi
    done

    # Check Nvidia device
    if [[ ! -e /dev/nvidia0 ]]; then
        warning "No device named /dev/nvidia0"
    fi
    if [[ ! -e /dev/nvidiactl ]]; then
        warning "No device named /dev/nvidiactl"
    fi
    if [[ ! -e /dev/nvidia-uvm ]]; then
        warning "No device named /dev/nvidia-uvm"
    fi
    if [[ ! -e /dev/nvidia-uvm-tools ]]; then
        warning "No device named /dev/nvidia-uvm-tools"
    fi
    if [[ ! -e /dev/nvidia-modeset ]]; then
        warning "No device named /dev/nvidia-modeset"
    fi
}

  该函数初始化CAN设备节点,为自动驾驶车辆的通信做准备,同时检查NVIDIA GPU设备的存在。

6.2.3. aarch64 架构设备初始化

function setup_device_for_aarch64() {
    local can_dev="/dev/can0"
    local socket_can_dev="can0"
    if [ ! -e "${can_dev}" ]; then
        warning "No CAN device named ${can_dev}. "
    fi

    if [[ -x "$(command -v ip)" ]]; then
        if ! ip link show type can | grep "${socket_can_dev}" &> /dev/null; then
            warning "No SocketCAN device named ${socket_can_dev}."
        else
            sudo modprobe can
            sudo modprobe can_raw
            sudo modprobe mttcan
            sudo ip link set "${socket_can_dev}" type can bitrate 500000 sjw 4 berr-reporting on loopback off
            sudo ip link set up "${socket_can_dev}"
        fi
    else
        warning "ip command not found."
    fi
}

6.3. 模块管理机制

6.3.1. 模块启动流程

  模块启动的核心函数:

function start_customized_path() {
    MODULE_PATH=$1
    MODULE=$2
    shift 2

    is_stopped_customized_path "${MODULE_PATH}" "${MODULE}"
    if [ $? -eq 1 ]; then
        # todo(zero): Better to move nohup.out to data/log/nohup.out
        eval "nohup cyber_launch start ${APOLLO_ROOT_DIR}/modules/${MODULE_PATH}/launch/${MODULE}.launch &"
        sleep 0.5
        is_stopped_customized_path "${MODULE_PATH}" "${MODULE}"
        if [ $? -eq 0 ]; then
            ok "Launched module ${MODULE}."
            return 0
        else
            error "Could not launch module ${MODULE}. Is it already built?"
            return 1
        fi
    else
        info "Module ${MODULE} is already running - skipping."
        return 2
    fi
}

6.3.2. 模块状态检查

function is_stopped_customized_path() {
    MODULE_PATH=$1
    MODULE=$2
    NUM_PROCESSES="$(pgrep -f "modules/${MODULE_PATH}/launch/${MODULE}.launch" | grep -cv '^1$')"
    if [ "${NUM_PROCESSES}" -eq 0 ]; then
        return 1
    else
        return 0
    fi
}

  该函数检查模块是否处于停止状态。

6.4. 构建系统实现

6.4.1. 构建目标确定

function determine_build_targets() {
    local targets_all
    if [[ "$#" -eq 0 ]]; then
        targets_all="$(python3 ${TOP_DIR}/scripts/find_all_package.py)"
        echo "${targets_all}"
        return
    fi

    for component in $@; do
        local build_targets
        if [ "${component}" = "cyber" ]; then
            build_targets="cyber"
        elif [[ -d "${TOP_DIR}/modules/${component}" ]]; then
            build_targets="modules/${component}"
        else
            error "Directory ${TOP_DIR}/modules/${component} not found. Exiting ..."
            exit 1
        fi
        if [ -z "${targets_all}" ]; then
            targets_all="${build_targets}"
        else
            targets_all="${targets_all} ${build_targets}"
        fi
    done
    echo "${targets_all}"
}

6.4.2. 构建参数处理

  脚本使用getopts处理构建参数:

while getopts "cdef:g:hij:mn:pt:uv" opt; do
  case $opt in
    c)
      ACTION=clean
      ;;
    d)
      if [ -z "${SHORTHAND_TARGETS}" ]; then
        SHORTHAND_TARGETS="all"
      fi
      USE_DBG=1
      ;;
    e)
      ENABLE_PROFILER=false
      ;;
    f)
      ADDTIONAL_OPTIONS="${ADDTIONAL_OPTIONS} --compilation_mode=${OPTARG}"
      ;;
    g)
      ADDTIONAL_OPTIONS="${ADDTIONAL_OPTIONS} --cxxopt=-g${OPTARG}"
      ;;
    h)
      usage
      exit 0
      ;;
    i)
      USE_OPT=1
      ;;
    j)
      ADDTIONAL_OPTIONS="${ADDTIONAL_OPTIONS} -j${OPTARG}"
      ;;
    m)
      USE_GPU=0
      ;;
    n)
      ADDTIONAL_OPTIONS="${ADDTIONAL_OPTIONS} --jobs=${OPTARG}"
      ;;
    p)
      ACTION=build
      ;;
    t)
      if [ -z "${SHORTHAND_TARGETS}" ]; then
        SHORTHAND_TARGETS="all"
      fi
      ADDTIONAL_OPTIONS="${ADDTIONAL_OPTIONS} --test_timeout=${OPTARG}"
      ;;
    u)
      USE_GPU=1
      ;;
    v)
      set -x
      ;;
    \?)
      echo "Invalid option: -$OPTARG" >&2
      exit 1
      ;;
    :)
      echo "Option -$OPTARG requires an argument." >&2
      exit 1
      ;;
  esac
done

6.5. 配置管理脚本

6.5.1. 车辆配置切换

  实现了车辆配置的动态切换:

function switch_vehicle() {
    local vehicle_id=$1
    local vehicle_dir="${APOLLO_ROOT_DIR}/modules/calibration/data/${vehicle_id}"
    
    if [ ! -d "${vehicle_dir}" ]; then
        error "Invalid vehicle id: ${vehicle_id}. Directory does not exist: ${vehicle_dir}"
        usage
    fi

    # Create symbolic links for calibration data
    rm -rf ${APOLLO_ROOT_DIR}/modules/calibration/data/current
    ln -s ${vehicle_dir} ${APOLLO_ROOT_DIR}/modules/calibration/data/current

    ok "Successfully switched to vehicle: ${vehicle_id}"
}

7. 设计模式

7.1. 模板方法模式

function start_customized_path() {
    MODULE_PATH=$1
    MODULE=$2
    shift 2

    is_stopped_customized_path "${MODULE_PATH}" "${MODULE}"  # 检查状态
    if [ $? -eq 1 ]; then                                   # 算法骨架
        eval "nohup cyber_launch start ${APOLLO_ROOT_DIR}/modules/${MODULE_PATH}/launch/${MODULE}.launch &"
        sleep 0.5
        is_stopped_customized_path "${MODULE_PATH}" "${MODULE}"
        if [ $? -eq 0 ]; then
            ok "Launched module ${MODULE}."
            return 0
        else
            error "Could not launch module ${MODULE}. Is it already built?"
            return 1
        fi
    else
        info "Module ${MODULE} is already running - skipping."
        return 2
    fi
}

  这个函数定义了启动模块的通用流程,但具体的模块名称和路径可以由子类(即具体的模块启动脚本)来定制。

7.2. 策略模式

  在设备初始化中,Apollo Scripts使用了策略模式来处理不同架构的设备初始化:

function setup_device() {
    if [ "$(uname -s)" != "Linux" ]; then
        info "Not on Linux, skip mapping devices."
        return
    fi
    if [[ "${ARCH}" == "x86_64" ]]; then
        setup_device_for_amd64  # x86_64策略
    else
        setup_device_for_aarch64  # aarch64策略
    fi
}

  这里,setup_device_for_amd64setup_device_for_aarch64是两种不同的设备设置策略,系统根据当前架构选择合适的策略执行。

7.3. 工厂模式

  构建系统使用工厂模式来创建不同的构建目标:

function determine_build_targets() {
    # ...
    for component in $@; do
        local build_targets
        if [ "${component}" = "cyber" ]; then
            build_targets="cyber"
        elif [[ -d "${TOP_DIR}/modules/${component}" ]]; then
            build_targets="modules/${component}"
        else
            error "Directory ${TOP_DIR}/modules/${component} not found. Exiting ..."
            exit 1
        fi
        # ...
    done
    # ...
}

  根据不同的输入参数,函数创建不同的构建目标,这正是工厂模式的体现。

7.4. 单例模式

  环境变量和全局配置在整个脚本系统中只初始化一次,后续脚本直接使用,体现了单例模式:

TOP_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd -P)"

  这种模式确保了全局状态的一致性。

7.5. 适配器模式

if [ -f /.dockerenv ]; then
    APOLLO_IN_DOCKER=true
else
    APOLLO_IN_DOCKER=false
fi

  该脚本检测当前运行环境(Docker容器内或外),并提供统一的环境变量接口。

7.6. 命令模式

  模块管理中使用命令模式将操作封装为对象:

function start() {
    MODULE=$1
    shift

    start_customized_path $MODULE $MODULE "$@"
}

function stop() {
    MODULE=$1

    pkill -f "modules/${MODULE}/launch/${MODULE}.launch" || true
    sleep 1
}

7.7. 观察者模式

  监控脚本实现观察者模式,监听系统事件并做出反应:

while true; do
    check_system_status
    check_module_health
    sleep $MONITOR_INTERVAL
done

  监控系统作为观察者,定期检查系统状态和模块健康状况。

  这些设计模式的运用使得Apollo Scripts具有良好的可扩展性、可维护性和灵活性,为Apollo自动驾驶平台提供了可靠的脚本支持。

8. 总结

  apollo_scripts模块通过一系列精心设计的shell脚本,实现了Apollo系统的自动化构建、部署、运行和测试。其设计合理,模块化程度高,通过基础脚本提供通用功能,特定脚本完成专门任务,形成了一个完整的脚本生态系统。