极越智驾方案-百度Apollo-Cyber RT 源码解读概述 Cyber RT是一个高性能、高吞吐、低延时的计算运行框

概述

Cyber RT是一个高性能、高吞吐、低延时的计算运行框架，其中，动态加载技术和有向无环图（DAG）是其实现高性能重要途径之一。

Cyber RT采用了基于Component模块和有向无环图（DAG）的动态加载配置的工程框架。即将相关算法模块通过Component创建，并通过DAG拓扑定义对各Component依赖关系进行动态加载和配置，从而实现对算法进行统一调度，对资源进行统一分配。采用这个工程框架可以使算法与工程解耦，达到工程更专注工程，算法更专注算法的目的。 (摘自官网)

源码解读

cyber.h

先从cyber.h入手

#ifndef CYBER_CYBER_H_
#define CYBER_CYBER_H_

#include <memory>
#include <string>
#include <utility>

#include "cyber/common/log.h" // log类
#include "cyber/component/component.h" // 组件类头文件
#include "cyber/init.h" // 初始化相关头文件
#include "cyber/node/node.h" // node实现的头文件
#include "cyber/task/task.h" // task执行类
#include "cyber/time/time.h"
#include "cyber/timer/timer.h" // timer类

namespace apollo {
namespace cyber {

std::unique_ptr<Node> CreateNode(const std::string& node_name,
                                 const std::string& name_space = ""); // 创建node

}  // namespace cyber
}  // namespace apollo

#endif  // CYBER_CYBER_H_

从头文件分布可以看出，除了一些log和time这些工具类，还有node、component、task、timer，这可以说是构成cyber的一些基础类。

Node

cyber/node/node.h

从类的注释中可以看出：

Node是系统中最基础的单元, 所有功能都是围绕Node来构建的
Node作为通信的中介者, 它是模块间信息交换的桥梁
节点支持多种通信模式, read/write（读写模式）：用于数据流通信,service/client（服务/客户端模式）：用于请求-响应通信
在拓扑对象中不允许重名，包括：节点名称、reader/writer（读写器）、service/client（服务/客户端

/**
 * @class Node
 * @brief Node is the fundamental building block of Cyber RT.
 * every module contains and communicates through the node.
 * A module can have different types of communication by defining
 * read/write and/or service/client in a node.
 * @warning Duplicate name is not allowed in topo objects, such as node,
 * reader/writer, service/client in the topo.
 */

node.h中主要提供了一系列的方法来创建内部通信的桥梁，内部主要的变量如下:

  std::map<std::string, std::shared_ptr<ReaderBase>> readers_;

  std::unique_ptr<NodeChannelImpl> node_channel_impl_ = nullptr;
  std::unique_ptr<NodeServiceImpl> node_service_impl_ = nullptr;
  
template <typename Request, typename Response>
auto Node::CreateService(
    const std::string& service_name,
    const typename Service<Request, Response>::ServiceCallback&
        service_callback) -> std::shared_ptr<Service<Request, Response>> {
  return node_service_impl_->template CreateService<Request, Response>(
      service_name, service_callback); // template 关键字用在一个模板函数中调用另外一个模板函数，告诉编译器这是模板函数，否则会编译报错
}

readers_将存入CreateReader创建的reader；node_channel_impl_会在CreaterWriter中调用；node_service_impl_在CreateService和CreateClient。

Reader

cyber/node/reader.h

注释主要说以下几点：

通过传入回调函数的方式获取数据并在回调函数中处理

   // 使用回调函数处理消息
   auto reader = node->CreateReader<MessageType>("topic", [](const MessageType& msg) {
       // 处理消息
   });

观察者方式，通过监听缓存中的消息

   // 使用Observe方法获取消息
   auto reader = node->CreateReader<MessageType>("topic");
   auto msg = reader->Observe();

每一个reader管理一个channelBuffer用于存储消息，消息的生命周期管理；限制长度为pending_queue_size,超过会丢弃旧的消息

/**
 * @class Reader
 * @brief Reader subscribes a channel, it has two main functions:
 * 1. You can pass a `CallbackFunc` to handle the message then it arrived
 * 2. You can Observe messages that Blocker cached. Reader automatically push
 * the message to Blocker's `PublishQueue`, and we can use `Observe` to fetch
 * messages from `PublishQueue` to `ObserveQueue`. But, if you have set
 * CallbackFunc, you can ignore this. One Reader uses one `ChannelBuffer`, the
 * message we are handling is stored in ChannelBuffer Reader will Join the
 * topology when init and Leave the topology when shutdown
 * @warning To save resource, `ChannelBuffer` has limited length,
 * it's passed through the `pending_queue_size` param. pending_queue_size is
 * default set to 1, So, If you handle slower than writer sending, older
 * messages that are not handled will be lost. You can increase
 * `pending_queue_size` to resolve this problem.
 */

ChannelImpl

cyber/node/node_channel_impl.h

channel是连接各个node的桥梁。

/**

 * @class NodeChannelImpl

 * @brief The implementation for Node to create Objects connected by Channels.

 * e.g. Channel Reader and Writer

 */
 
 explicit NodeChannelImpl(const std::string& node_name)
      : is_reality_mode_(true), node_name_(node_name) {
    ...
    if (is_reality_mode_) {
      node_manager_ = service_discovery::TopologyManager::Instance()->node_manager();
      node_manager_->Join(node_attr_, RoleType::ROLE_NODE);
    }
  }

NodeChannelImpl 构造时将node name传入RoleAttributes的结构体，调用Join方法放入NodeManager。 NodeChannelImpl 内部还包含CreateWriter和CreateReader方法创建Writer和Reader进行通信。

TopologyManager

cyber/service_discovery/topology_manager.h

topologyManager在Reader和writer中都有看到，通过JoinTheTopology和LeaveTheTopology实现对元素的管理。

/**
 * @class TopologyManager
 * @brief elements in Cyber -- Node, Channel, Service, Writer, Reader, Client
 * and Server's relationship is presented by Topology. You can Imagine that a
 * directed graph -- Node is the container of Server/Client/Writer/Reader, and
 * they are the vertice of the graph and Channel is the Edge from Writer flow to
 * the Reader, Service is the Edge from Server to Client. Thus we call Writer
 * and Server `Upstream`, Reader and Client `Downstream` To generate this graph,
 * we use TopologyManager, it has three sub managers -- NodeManager: You can
 * find Nodes in this topology ChannelManager: You can find Channels in this
 * topology, and their Writers and Readers ServiceManager: You can find Services
 * in this topology, and their Servers and Clients TopologyManager use
 * fast-rtps' Participant to communicate. It can broadcast Join or Leave
 * messages of those elements. Also, you can register you own `ChangeFunc` to
 * monitor topology change
 */

注释可以得出以下几点： CyberRT 中的元素之间的关系由 Topology 呈现

基本元素
- Node（节点）
- Channel（通道）
- Service（服务）
- Writer（写入器）
- Reader（读取器）
- Client（客户端）
- Server（服务端）
拓扑结构
- 形式：有向图（directed graph）
- 节点（Vertices）：
  - Node作为容器，有向图的顶点
  - 包含Server/Client/Writer/Reader
- 边（Edges）：
  - Channel：连接Writer到Reader
  - Service：连接Server到Client
数据流向概念
- Upstream（上游）：
  - Writer
  - Server
- Downstream（下游）：
  - Reader
  - Client
管理器结构
- TopologyManager包含三个子管理器：
  - NodeManager：管理节点，查找 Node 是否在 Topology 当中
  - ChannelManager：管理通道及其Writer/Reader，查找 Channel 是否在 Topology 当中，以及对应的 Writer 和 Reader
  - ServiceManager：管理服务及其Server/Client，查找 Service 是否在 Topology 当中，以及对应的 Server 和 Client
通信机制
- 使用fast-rtps的Participant进行通信，可以监听元素加入和离开 topology 网络
- 可以广播元素的：
  - Join（加入）消息
  - Leave（离开）消息
监控功能
- 可以注册自定义的ChangeFunc
- 用于监控拓扑变化

Component

cybe/component/component_base.h cybe/component/component.h cybe/component/timer_component.h

Component 是 Cyber RT提供的用来构建功能模块的基础类，可以理解为Cyber RT对算法功能模块的封装，配合Component对应的DAG文件，Cyber RT可实现对该功能模块的动态加载。

被 Cyber RT加载的 Component 构成了一张去中心化的网络。每一个Component是一个算法功能模块，其依赖关系由对应的配置文件定义，而一个系统是由多个component组成，各个component由其相互间的依赖关系连接在一起，构成一张计算图。

Component 有两种类型，分别为 apollo::cyber::Component 和 apollo::cyber::TimerComponent 。

Component 提供消息融合机制，最多可以支持 4 路消息融合，当从多个 Channel 读取数据的时候，以第一个 Channel 为主 Channel。当主 Channel 有消息到达，Cyber RT会调用 Component 的 **apollo::cyber::Component::Proc **进行一次数据处理。
TimerComponent 不提供消息融合，与 Component 不同的是 TimerComponent 的 apollo::cyber::TimerComponent::Proc 函数不是基于主 channel 触发执行，而是由系统定时调用，开发者可以在配置文件中确定调用的时间间隔。

两种类型的component都继承自ComponentBase。ComponentBase 提供config解析方法，以及Node和ReaderBase智能指针

component_base.h

 // enable_shared_from_this 提供安全的获取自身shared_ptr的能力，对子类生效；避免创建多个独立的 shared_ptr 指向同一个对象；确保对象的生命周期管理的一致性
class ComponentBase : public std::enable_shared_from_this<ComponentBase> {
    ...
protected:
    // 提供读取配置文件的方法
    void LoadConfigFiles(const ComponentConfig& config) {
        ...
    }
    
    void LoadConfigFiles(const TimerComponentConfig& config) {
        ...
    }
    std::atomic<bool> is_shutdown_ = {false};

    std::shared_ptr<Node> node_ = nullptr;

    std::string config_file_path_ = "";

    std::vector<std::shared_ptr<ReaderBase>> readers_;
    
};

component.h

/**

 * @brief .

 * The Component can process up to four channels of messages. The message type

 * is specified when the component is created. The Component is inherited from

 * ComponentBase. Your component can inherit from Component, and implement

 * Init() & Proc(...), They are picked up by the CyberRT. There are 4

 * specialization implementations.

 *

 * @tparam M0 the first message.

 * @tparam M1 the second message.

 * @tparam M2 the third message.

 * @tparam M3 the fourth message.

 * @warning The Init & Proc functions need to be overloaded, but don't want to

 * be called. They are called by the CyberRT Frame.

 *

 */

template <typename M0 = NullType, typename M1 = NullType,

          typename M2 = NullType, typename M3 = NullType>

class Component : public ComponentBase {

 public:

  Component() {}

  ~Component() override {}

  


  /**

   * @brief init the component by protobuf object.

   *

   * @param config which is defined in 'cyber/proto/component_conf.proto'

   *

   * @return returns true if successful, otherwise returns false

   */

  bool Initialize(const ComponentConfig& config) override;

  bool Process(const std::shared_ptr<M0>& msg0, const std::shared_ptr<M1>& msg1,

               const std::shared_ptr<M2>& msg2,

               const std::shared_ptr<M3>& msg3);

  


 private:

  /**

   * @brief The process logical of yours.

   *

   * @param msg0 the first channel message.

   * @param msg1 the second channel message.

   * @param msg2 the third channel message.

   * @param msg3 the fourth channel message.

   *

   * @return returns true if successful, otherwise returns false

   */

  virtual bool Proc(const std::shared_ptr<M0>& msg0,

                    const std::shared_ptr<M1>& msg1,

                    const std::shared_ptr<M2>& msg2,

                    const std::shared_ptr<M3>& msg3) = 0;

};

  

// 无消息特化 (Component<NullType, NullType, NullType, NullType>)

// 能力：不处理任何消息。

// 用途：用于不需要处理消息的组件，可能用于初始化或配置目的。
template <>

class Component<NullType, NullType, NullType, NullType> : public ComponentBase {

 public:

  Component() {}

  ~Component() override {}

  bool Initialize(const ComponentConfig& config) override;

};

  
// 单消息特化 (Component<M0, NullType, NullType, NullType>)

// 能力：处理一个消息通道。

// 用途：适用于简单的组件，只需要从一个通道接收消息进行处理。

template <typename M0>

class Component<M0, NullType, NullType, NullType> : public ComponentBase {

 public:

  Component() {}

  ~Component() override {}

  bool Initialize(const ComponentConfig& config) override;

  bool Process(const std::shared_ptr<M0>& msg);

  


 private:

  virtual bool Proc(const std::shared_ptr<M0>& msg) = 0;

};

  
// 双消息特化 (Component<M0, M1, NullType, NullType>)

// 能力：处理两个消息通道。

// 用途：适用于需要从两个通道接收消息并进行联合处理的组件，例如传感器数据融合。

template <typename M0, typename M1>

class Component<M0, M1, NullType, NullType> : public ComponentBase {

 public:

  Component() {}

  ~Component() override {}

  bool Initialize(const ComponentConfig& config) override;

  bool Process(const std::shared_ptr<M0>& msg0,

               const std::shared_ptr<M1>& msg1);

  


 private:

  virtual bool Proc(const std::shared_ptr<M0>& msg,

                    const std::shared_ptr<M1>& msg1) = 0;

};

  
// 三消息特化 (Component<M0, M1, M2, NullType>)

// 能力：处理三个消息通道。

// 用途：适用于需要从三个通道接收消息的复杂组件，可能用于多传感器数据处理。

template <typename M0, typename M1, typename M2>

class Component<M0, M1, M2, NullType> : public ComponentBase {

 public:

  Component() {}

  ~Component() override {}

  bool Initialize(const ComponentConfig& config) override;

  bool Process(const std::shared_ptr<M0>& msg0, const std::shared_ptr<M1>& msg1,

               const std::shared_ptr<M2>& msg2);

  


 private:

  virtual bool Proc(const std::shared_ptr<M0>& msg,

                    const std::shared_ptr<M1>& msg1,

                    const std::shared_ptr<M2>& msg2) = 0;

};

Component有四个特化版本，分别支持不同channel数量的消息通信。特化版本中的Initlize函数会调用ComponentBase中的LoadConfigFiles解析config填到对应的结构体中，并且会创建Node和Reader。Process函数会返回Proc函数的执行结果。开发者可以自定义 Component，只要继承 Component 并复写它的 Init() 和 Proc()，Init 和 Proc 的调用是由 CyberRT Frame 驱动的，不要主动去调用它们。

timer_component.h -- 定时器组件

/**

 * @brief .

 * TimerComponent is a timer component. Your component can inherit from

 * Component, and implement Init() & Proc(), They are called by the CyberRT

 * frame.

 */

class TimerComponent : public ComponentBase {

 public:

  TimerComponent();

  ~TimerComponent() override;

  


  /**

   * @brief init the component by protobuf object.

   *

   * @param config which is define in 'cyber/proto/component_conf.proto'

   *

   * @return returns true if successful, otherwise returns false

   */
  // 初始化解析config等
  bool Initialize(const TimerComponentConfig& config) override;

  void Clear() override;

  bool Process();

  uint32_t GetInterval() const;

  


 private:

  /**

   * @brief The Proc logic of the component, which called by the CyberRT frame.

   *

   * @return returns true if successful, otherwise returns false

   */
  // cyber frame 调用 Proc
  virtual bool Proc() = 0;

  


  uint32_t interval_ = 0;

  std::unique_ptr<Timer> timer_;

};

TimerComponent 内部的Timer实现主要通过Timer类，TimerComponent内部会创建Timer并调用start。

cyber/timer/timer.hpp

Timer类通过TimingWheel实现高效的定时任务管理，支持一次性和周期性任务。其设计利用了环形缓冲区的特性，提供了高效的时间复杂度（O(1)）来管理大量定时任务。通过异步执行和误差校正，Timer类确保了任务的准确性和系统的性能。

/**

 * @class Timer

 * @brief Used to perform oneshot or periodic timing tasks

 *

 */

class Timer {

 public:

  Timer();

  


  /**

   * @brief Construct a new Timer object

   *

   * @param opt Timer option

   */

  explicit Timer(TimerOption opt);

  


  /**

   * @brief Construct a new Timer object

   *

   * @param period The period of the timer, unit is ms

   * @param callback The tasks that the timer needs to perform

   * @param oneshot True: perform the callback only after the first timing cycle

   *                False: perform the callback every timed period

   */

  Timer(uint32_t period, std::function<void()> callback, bool oneshot);

  ~Timer();

  


  /**

   * @brief Set the Timer Option object

   *

   * @param opt The timer option will be set

   */

  void SetTimerOption(TimerOption opt);


  /**

   * @brief Start the timer

   *

   */

  void Start();

  /**

   * @brief Stop the timer

   *

   */

  void Stop();

 private:

  bool InitTimerTask();

  uint64_t timer_id_;

  TimerOption timer_opt_;

  TimingWheel* timing_wheel_ = nullptr;

  std::shared_ptr<TimerTask> task_;

  std::atomic<bool> started_ = {false};

};

// TimerTask结构体
struct TimerTask {

  explicit TimerTask(uint64_t timer_id) : timer_id_(timer_id) {}

  uint64_t timer_id_ = 0;

  std::function<void()> callback;

  uint64_t interval_ms = 0;

  uint64_t remainder_interval_ms = 0;

  uint64_t next_fire_duration_ms = 0;

  int64_t accumulated_error_ns = 0;

  uint64_t last_execute_time_ns = 0;

  std::mutex mutex;

};

Timer类内部主要包含初始化，任务初始化，启动/停止定时器，内部主要的成员一个是TimingWheel，用于管理定时器，一个是TimerTask，用于创建定时任务，每个定时任务都会生成一个TimerId，TimerTask的初始化在InitTimerTask()中实现。

timing_wheel.h

class TimingWheel {

 public:

  ~TimingWheel() {

    if (running_) {

      Shutdown();

    }

  }

  


  void Start();

  


  void Shutdown();

  


  void Tick();

  


  void AddTask(const std::shared_ptr<TimerTask>& task);

  


  void AddTask(const std::shared_ptr<TimerTask>& task,

               const uint64_t current_work_wheel_index);

  


  void Cascade(const uint64_t assistant_wheel_index);

  


  void TickFunc();

  


  inline uint64_t TickCount() const { return tick_count_; }

  


 private:

  ...

  TimerBucket work_wheel_[WORK_WHEEL_SIZE];

  TimerBucket assistant_wheel_[ASSISTANT_WHEEL_SIZE];
  ...

};

TimerBucket

定义：TimerBucket是一个容器，用于存储定时任务（TimerTask）。
结构：
- 包含一个任务列表（task_list_），存储TimerTask的弱指针。
- 使用互斥锁（mutex_）来保护对任务列表的访问。
功能：
- 提供添加任务的方法AddTask，将任务添加到任务列表中。
- 提供访问任务列表和互斥锁的方法。

TimingWheel定义了两个TimerBucket数组来存储定时任务。

环形缓冲区实现原理调用TimingWheel的AddTask时会计算出环形队列的索引，后续执行时用于获取TimerBucket。

auto work_wheel_index = current_work_wheel_index +

                          static_cast<uint64_t>(std::ceil(

                              static_cast<double>(task->next_fire_duration_ms) /

                              TIMER_RESOLUTION_MS));

执行层 TickFunc会启动一个线程循环的执行Tick()函数，Tick函数中封装着传入的回调函数。Tick函数执行完之后会进入休眠，时间为TIMER_RESOLUTION_MS，也就是每2ms会唤醒线程执行任务，因此如果一个任务的定时为10ms，那么他的索引计算后为5的位置，线程循环就会在第5次执行Task。线程每次唤醒之后需要更新当前的环形队列索引current_work_wheel_index_，如果当前索引为0，则需要更新辅助轮索引将任务分配到工作轮。

while (running_) {

    Tick();

    // AINFO_EVERY(1000) << "Tick " << TickCount();

    tick_count_++;

    rate.Sleep();

    {

      std::lock_guard<std::mutex> lock(current_work_wheel_index_mutex_);

      current_work_wheel_index_ =

          GetWorkWheelIndex(current_work_wheel_index_ + 1);

    }

    if (current_work_wheel_index_ == 0) {

      {

        std::lock_guard<std::mutex> lock(current_assistant_wheel_index_mutex_);

        current_assistant_wheel_index_ =

            GetAssistantWheelIndex(current_assistant_wheel_index_ + 1);

      }

      Cascade(current_assistant_wheel_index_);

    }

  }

在源码中，我们还会注意到TimerTask使用弱指针修饰。这是因为防止循环引用和对生命周期的管理。

如果TimerTask使用std::shared_ptr来引用自己或其他对象，而这些对象也持有对TimerTask的std::shared_ptr，就会形成循环引用。
循环引用会导致内存泄漏，因为引用计数永远不会降到零，导致对象无法被释放。
弱指针不会影响对象的生命周期。它允许你检查对象是否仍然存在，而不增加引用计数。
通过lock()方法，弱指针可以安全地转换为std::shared_ptr，如果对象已经被销毁，lock()会返回一个空的shared_ptr。
在定时器系统中，任务可能会被取消或重新调度。使用弱指针可以确保在任务被销毁后，任何对它的引用都不会导致未定义行为。

bool Timer::InitTimerTask() {

  ...
    std::weak_ptr<TimerTask> task_weak_ptr = task_;

    task_->callback = [callback = this->timer_opt_.callback, task_weak_ptr]() {

      auto task = task_weak_ptr.lock();

      if (task) {

        std::lock_guard<std::mutex> lg(task->mutex);

        callback();

      }

    };
  ...
  return true;

}

在Timer::InitTimerTask()中，任务的回调函数使用了弱指针：

task_weak_ptr.lock()：尝试获取TimerTask的shared_ptr。
如果TimerTask仍然存在，lock()返回一个有效的shared_ptr。
如果TimerTask已经被销毁，lock()返回一个空的shared_ptr，从而避免了对已销毁对象的访问。

Schduler

调度机制也是CyberRT中比较重要的一部分，我们通过schdule的实现来了解他。首先从Init中，对Sysmo对象进行初始化, Sysmo内部会调用schdule的CheckSchedStatus.

void SysMo::Checker() {
  while (cyber_unlikely(!shut_down_.load())) {
    scheduler::Instance()->CheckSchedStatus();
    std::unique_lock<std::mutex> lk(lk_);
    cv_.wait_for(lk, std::chrono::milliseconds(sysmo_interval_ms_));
  }
}

schdule_factory.cc

schdule的初始化在schdule_factory的Instance函数，主要工作是通过policy来创建不同类型的schdule。

Scheduler* Instance() {
  Scheduler* obj = instance.load(std::memory_order_acquire);
  if (obj == nullptr) {
    std::lock_guard<std::mutex> lock(mutex);
    obj = instance.load(std::memory_order_relaxed);
    if (obj == nullptr) {
      std::string policy("classic");
      std::string conf("conf/");
      conf.append(GlobalData::Instance()->ProcessGroup()).append(".conf");
      auto cfg_file = GetAbsolutePath(WorkRoot(), conf);
      apollo::cyber::proto::CyberConfig cfg;
      if (PathExists(cfg_file) && GetProtoFromFile(cfg_file, &cfg)) {
        policy = cfg.scheduler_conf().policy();
      } else {
        AWARN << "Scheduler conf named " << cfg_file
              << " not found, use default.";
      }
      if (!policy.compare("classic")) {
        obj = new SchedulerClassic();
      } else if (!policy.compare("choreography")) {
        obj = new SchedulerChoreography();
      } else {
        AWARN << "Invalid scheduler policy: " << policy;
        obj = new SchedulerClassic();
      }
      instance.store(obj, std::memory_order_release);
    }
  }
  return obj;
}

scheduler_classic.cc

SchedulerClassic是一个调度器实现，负责管理和调度协程（CRoutine）的执行。它基于经典的调度策略，提供了一种灵活的任务管理和调度机制。构造函数中会读取调度器的配置，并调用CreateProcessor，CreateProcessor函数通过读取的配置创建处理器。DispatchTask负责分配任务到合适的处理器，根据任务的配置设置优先级和组名，将任务加到相应的处理器中，并通知处理器。NotifyProcessor根据需求唤醒处理器。

class SchedulerClassic : public Scheduler {
 public:
  bool RemoveCRoutine(uint64_t crid) override;
  bool RemoveTask(const std::string& name) override;
  bool DispatchTask(const std::shared_ptr<CRoutine>&) override;

 private:
  friend Scheduler* Instance();
  SchedulerClassic();

  void CreateProcessor(); // 创建处理器
  bool NotifyProcessor(uint64_t crid) override; // 通知处理器

  std::unordered_map<std::string, ClassicTask> cr_confs_;

  ClassicConf classic_conf_;
};

scheduler_choreography.cc

每个函数的含义是相同的，只是和classic的区别在于不同的调度方式。choreography采用编排的调度方式，使得调度配置更加灵活。SchedulerChoreography通过灵活的任务分配和处理器配置，实现了编排调度方式。任务根据配置被分配到特定的处理器，处理器根据任务的优先级和配置进行调度。这种设计提供了更高的灵活性和资源利用率，适合于需要动态调整调度策略的应用场景。

class SchedulerChoreography : public Scheduler {
 public:
  bool RemoveCRoutine(uint64_t crid) override;
  bool RemoveTask(const std::string& name) override;
  bool DispatchTask(const std::shared_ptr<CRoutine>&) override;

 private:
  friend Scheduler* Instance();
  SchedulerChoreography();

  void CreateProcessor();
  bool NotifyProcessor(uint64_t crid) override;

  std::unordered_map<std::string, ChoreographyTask> cr_confs_;

  int32_t choreography_processor_prio_;
  int32_t pool_processor_prio_;

  std::string choreography_affinity_;
  std::string pool_affinity_;

  std::string choreography_processor_policy_;
  std::string pool_processor_policy_;

  std::vector<int> choreography_cpuset_;
  std::vector<int> pool_cpuset_;
};

MainBoard

需要了解Node和Schdule的关系，得在main函数中找到联系的地方。

mainboard.cc

int main(int argc, char** argv) {
  // parse the argument
  ModuleArgument module_args;
  module_args.ParseArgument(argc, argv);

  auto dag_list = module_args.GetDAGConfList();

  std::string dag_info;
  for (auto&& i = dag_list.begin(); i != dag_list.end(); i++) {
    size_t pos = 0;
    for (size_t j = 0; j < (*i).length(); j++) {
      pos = ((*i)[j] == '/') ? j: pos;
    }
    if (i != dag_list.begin()) dag_info += "_";

    if (pos == 0) {
      dag_info += *i;
    } else {
      dag_info +=
        (pos == (*i).length()-1) ? (*i).substr(pos): (*i).substr(pos+1);
    }
  }

  // initialize cyber
  apollo::cyber::Init(argv[0], dag_info);

  // start module
  ModuleController controller(module_args);
  if (!controller.Init()) {
    controller.Clear();
    AERROR << "module start error.";
    return -1;
  }

  ...

  controller.Clear();
  AINFO << "exit mainboard.";

  return 0;
}

ModuleController的实现中可以看到最终会调用LoadModule函数初始化component。

module_controller.cc

bool ModuleController::LoadModule(const DagConfig& dag_config) {
  for (auto module_config : dag_config.module_config()) {
    std::string load_path;
    if (!common::GetFilePathWithEnv(module_config.module_library(),
                                    "APOLLO_LIB_PATH", &load_path)) {
      AERROR << "no module library [" << module_config.module_library()
             << "] found!";
      return false;
    }
    AINFO << "mainboard: use module library " << load_path;

    class_loader_manager_.LoadLibrary(load_path);

    for (auto& component : module_config.components()) {
      const std::string& class_name = component.class_name();
      // 构造component
      std::shared_ptr<ComponentBase> base =
          class_loader_manager_.CreateClassObj<ComponentBase>(class_name);
      if (base == nullptr || !base->Initialize(component.config())) {
        return false;
      }
      component_list_.emplace_back(std::move(base));
    }

    for (auto& component : module_config.timer_components()) {
      const std::string& class_name = component.class_name();
      std::shared_ptr<ComponentBase> base =
          class_loader_manager_.CreateClassObj<ComponentBase>(class_name);
      if (base == nullptr || !base->Initialize(component.config())) {
        return false;
      }
      component_list_.emplace_back(std::move(base));
    }
  }
  return true;
}

Component内部会使用Schdule创建一个根据Node name命名的Task。Scheduler 会将 node 与 CRoutine 建立联系，然后与 Processor 也建立联系。核心点在于 cr，它是协程单元，在 Component 中通过 RoutineFactory 创建。

component.h


  auto sched = scheduler::Instance();
  std::weak_ptr<Component<M0, M1, M2, M3>> self =
      std::dynamic_pointer_cast<Component<M0, M1, M2, M3>>(shared_from_this());
  // 设置func
  auto func =
      [self, role_attr](const std::shared_ptr<M0>& msg0,
                        const std::shared_ptr<M1>& msg1,
                        const std::shared_ptr<M2>& msg2,
                        const std::shared_ptr<M3>& msg3) {
        auto start_time = Time::Now().ToMicrosecond();
        auto ptr = self.lock();
        if (ptr) {
          // 执行回调函数
          ptr->Process(msg0, msg1, msg2, msg3);
          auto end_time = Time::Now().ToMicrosecond();
          // sampling proc latency and cyber latency in microsecond
          uint64_t process_start_time;
          statistics::Statistics::Instance()->SamplingProcLatency<
                            uint64_t>(*role_attr, end_time-start_time);
          if (statistics::Statistics::Instance()->GetProcStatus(
                *role_attr, &process_start_time) && (
                                      start_time-process_start_time) > 0) {
            statistics::Statistics::Instance()->SamplingCyberLatency(
                              *role_attr, start_time-process_start_time);
          }
        } else {
          AERROR << "Component object has been destroyed." << std::endl;
        }
      };

  std::vector<data::VisitorConfig> config_list;
  for (auto& reader : readers_) {
    config_list.emplace_back(reader->ChannelId(), reader->PendingQueueSize());
  }
  auto dv = std::make_shared<data::DataVisitor<M0, M1, M2, M3>>(config_list);
  // 创建Rotinue
  croutine::RoutineFactory factory =
      croutine::CreateRoutineFactory<M0, M1, M2, M3>(func, dv);
  return sched->CreateTask(factory, node_->Name());

routine_factory.h

template <typename M0, typename M1, typename M2, typename F>
RoutineFactory CreateRoutineFactory(
    F&& f, const std::shared_ptr<data::DataVisitor<M0, M1, M2>>& dv) {
  RoutineFactory factory;
  factory.SetDataVisitor(dv); // 设置Datavsitor
  factory.create_routine = [=]() {
    return [=]() {
      std::shared_ptr<M0> msg0;
      std::shared_ptr<M1> msg1;
      std::shared_ptr<M2> msg2;
      for (;;) {
        CRoutine::GetCurrentRoutine()->set_state(RoutineState::DATA_WAIT);
        if (dv->TryFetch(msg0, msg1, msg2)) { // 通过DataVistor获取消息
          f(msg0, msg1, msg2); // 调用消息
          CRoutine::Yield(RoutineState::READY); // 调用yield让出调度权
        } else {
          CRoutine::Yield();
        }
      }
    };
  };
  return factory;
}

f函数会调用Component的Process函数，内部在调用Proc函数,这样就会驱动执行回调函数。

以上，分析的是如何创建Component并且执行任务的大致流程。

参考链接：自动驾驶Apollo源码分析系统,CyberRT篇(一):简述CyberRT框架基础概念-腾讯云开发者社区-腾讯云