kata runtime-rs 辅助理解

318 阅读4分钟

概述

runtime-rs 是 kata-containers 对Go版本的重写,本文主要考虑到Rust的可读性更好,从rs版本可以更好地了解runtime的实现。时间紧张,不做深入的研究。

details

shim-v2

bin

  • argo位于src/runtime-rs/crates/shim/Cargo.toml
  • rpc的实现使用代理模式+过程宏,见src/runtime-rs/crates/service/src/task_service.rs
[[bin]]
name = "containerd-shim-kata-v2"
path = "src/bin/main.rs"

api

#[async_trait]
pub trait ContainerManager: Send + Sync {
    // container lifecycle
    async fn create_container(&self, config: ContainerConfig, spec: oci::Spec) -> Result<PID>;
    async fn pause_container(&self, container_id: &ContainerID) -> Result<()>;
    async fn resume_container(&self, container_id: &ContainerID) -> Result<()>;
    async fn stats_container(&self, container_id: &ContainerID) -> Result<StatsInfo>;
    async fn update_container(&self, req: UpdateRequest) -> Result<()>;
    async fn connect_container(&self, container_id: &ContainerID) -> Result<PID>;

    // process lifecycle
    async fn close_process_io(&self, process_id: &ContainerProcess) -> Result<()>;
    async fn delete_process(&self, process_id: &ContainerProcess) -> Result<ProcessStateInfo>;
    async fn exec_process(&self, req: ExecProcessRequest) -> Result<()>;
    async fn kill_process(&self, req: &KillRequest) -> Result<()>;
    async fn resize_process_pty(&self, req: &ResizePTYRequest) -> Result<()>;
    async fn start_process(&self, process_id: &ContainerProcess) -> Result<PID>;
    async fn state_process(&self, process_id: &ContainerProcess) -> Result<ProcessStateInfo>;
    async fn wait_process(&self, process_id: &ContainerProcess) -> Result<ProcessExitStatus>;

    // utility
    async fn pid(&self) -> Result<PID>;
    async fn need_shutdown_sandbox(&self, req: &ShutdownRequest) -> bool;
    async fn is_sandbox_container(&self, process_id: &ContainerProcess) -> bool;
}

启动流程

  • main: real_main -> shim.run
  • service_manager: ShimExecutor -> do_run -> get_server_fd -> service::ServiceManager::new
  • server: service_manager.run -> self.task_server.as_mut() -> t.start()

VirtContainerManager

  • VirtSandox 和 VirtContainerManager 共同构成了 RuntimeInstance(new_instance)
  • RuntimeInstance是在RuntimeHandlerManagerInner的init_runtime_handler中被创建的
pub struct VirtContainerManager {
    sid: String,
    pid: u32,
    containers: Arc<RwLock<HashMap<String, Container>>>,
    resource_manager: Arc<ResourceManager>,
    agent: Arc<dyn Agent>,
    hypervisor: Arc<dyn Hypervisor>,
}

Container

  • Container 是 oci::Spec 的实体化,被VirtContainerManager所管理
pub async fn create(&self, mut spec: oci::Spec) -> Result<()> {
    // get mutable root from oci spec
    let root = match spec.root.as_mut() {
        Some(root) => root,
        None => return Err(anyhow!("spec miss root field")),
    };
    // handler rootfs
    let rootfs = self
        .resource_manager
        .handler_rootfs(
            &config.container_id,
            root,
            &config.bundle,
            &config.rootfs_mounts,
        )
        .await
        .context("handler rootfs")?;
    // update rootfs
    root.path = rootfs
        .get_guest_rootfs_path()
        .await
        .context("get guest rootfs path")?;
    // handler volumes
    let volumes = self
        .resource_manager
        .handler_volumes(&config.container_id, &spec)
        .await
        .context("handler volumes")?;
    let mut oci_mounts = vec![];
    spec.mounts = oci_mounts;
    // create container
    let r = agent::CreateContainerRequest {
        process_id: agent::ContainerProcessID::new(&config.container_id, ""),
        storages,
        oci: Some(spec),
        sandbox_pidns,
        devices: devices_agent,
        ..Default::default()
    };
    self.agent
        .create_container(r)
        .await
        .context("agent create container")?;
    self.resource_manager.dump().await;
    Ok(())
}

// 实际处理oci mount
pub async fn handler_volumes(
    &self,
    share_fs: &Option<Arc<dyn ShareFs>>,
    cid: &str,
    spec: &oci::Spec,
    d: &RwLock<DeviceManager>,
    sid: &str,
    agent: Arc<dyn Agent>,
) -> Result<Vec<Arc<dyn Volume>>> {
}

mount/storage

  • OCI标准定义的是mounts,每个mount有destination,type,source,options这几个字段
  • storage是和agent的格式,有driver,driver_options,source,fs_type,fs_group,options,mount_point这几个字段

ShimExecutor/ServiceManager

pub struct ShimExecutor {
    pub(crate) args: Args,
}

pub struct ServiceManager {
    receiver: Option<Receiver<Message>>,
    handler: Arc<RuntimeHandlerManager>,
    task_server: Option<Server>,
    binary: String,
    address: String,
    namespace: String,
}

RuntimeHandlerManager/RuntimeHandlerManagerInner

  • RuntimeHandlerManager 是shim-v2的实际实现和执行者,比如它的核心方法pub async fn handler_message(&self, req: Request) -> Result<Response>
pub struct RuntimeHandlerManager {
    inner: Arc<RwLock<RuntimeHandlerManagerInner>>,
}

struct RuntimeHandlerManagerInner {
    id: String,
    msg_sender: Sender<Message>,
    kata_tracer: Arc<Mutex<KataTracer>>,
    runtime_instance: Option<Arc<RuntimeInstance>>,
}

kata-agent/VirtSandbox

  • kata-agent也分为外部和Inner,inner的主要作用是get_agent_client,client类型是agent_ttrpc::AgentServiceClient

  • let agent = new_agent(&config).context("new agent")?;

  • let agent = KataAgent::new(agent_config.clone());

pub struct KataAgent {
    pub(crate) inner: Arc<RwLock<KataAgentInner>>,
}

pub fn new(config: AgentConfig) -> Self {
    KataAgent {
        inner: Arc::new(RwLock::new(KataAgentInner {
            client: None,
            client_fd: -1,
            socket_address: "".to_string(),
            config,
            log_forwarder: LogForwarder::new(),
        })),
    }
}

pub(crate) struct KataAgentInner {
    pub client: Option<Client>,
    pub client_fd: RawFd,
    pub socket_address: String,
    config: AgentConfig,
    log_forwarder: LogForwarder,
}

Sandbox/VirtSandbox

  • Sandbox在Go中被称为 composed of a set of containers and a runtime environment.
  • 总的来说,Sandbox是对整个Environment的抽象,复杂的逻辑会放在他的create/start等方法中
  • 比如,runtime_instance.sandbox.start(dns, spec, state, network_env)被RuntimeHandlerManagerInner的init_runtime_handler调用,它的上游是try_init
pub struct VirtSandbox {
    sid: String,
    msg_sender: Arc<Mutex<Sender<Message>>>,
    inner: Arc<RwLock<SandboxInner>>,
    resource_manager: Arc<ResourceManager>,
    agent: Arc<dyn Agent>,
    hypervisor: Arc<dyn Hypervisor>,
    monitor: Arc<HealthCheck>,
}

pub trait Sandbox: Send + Sync {
    async fn start(
        &self,
        dns: Vec<String>,
        spec: &oci::Spec,
        state: &oci::State,
        network_env: SandboxNetworkEnv,
    ) -> Result<()>;
    async fn stop(&self) -> Result<()>;
    async fn cleanup(&self) -> Result<()>;
    async fn shutdown(&self) -> Result<()>;

    // utils
    async fn set_iptables(&self, is_ipv6: bool, data: Vec<u8>) -> Result<Vec<u8>>;
    async fn get_iptables(&self, is_ipv6: bool) -> Result<Vec<u8>>;
    async fn direct_volume_stats(&self, volume_path: &str) -> Result<String>;
    async fn direct_volume_resize(&self, resize_req: agent::ResizeVolumeRequest) -> Result<()>;
    async fn agent_sock(&self) -> Result<String>;

    // metrics function
    async fn agent_metrics(&self) -> Result<String>;
    async fn hypervisor_metrics(&self) -> Result<String>;
}

Hypervisor

  • hypervisor的的抽象方式和Go类似,每种hypervisor都实现trait

pub trait Hypervisor: std::fmt::Debug + Send + Sync {
    // vm manager
    async fn prepare_vm(&self, id: &str, netns: Option<String>) -> Result<()>;
    async fn start_vm(&self, timeout: i32) -> Result<()>;
    async fn stop_vm(&self) -> Result<()>;
    async fn pause_vm(&self) -> Result<()>;
    async fn save_vm(&self) -> Result<()>;
    async fn resume_vm(&self) -> Result<()>;
    async fn resize_vcpu(&self, old_vcpus: u32, new_vcpus: u32) -> Result<(u32, u32)>; // returns (old_vcpus, new_vcpus)

    // device manager
    async fn add_device(&self, device: DeviceType) -> Result<()>;
    async fn remove_device(&self, device: DeviceType) -> Result<()>;

    // utils
    async fn get_agent_socket(&self) -> Result<String>;
    async fn disconnect(&self);
    async fn hypervisor_config(&self) -> HypervisorConfig;
    async fn get_thread_ids(&self) -> Result<VcpuThreadIds>;
    async fn get_pids(&self) -> Result<Vec<u32>>;
    async fn get_vmm_master_tid(&self) -> Result<u32>;
    async fn get_ns_path(&self) -> Result<String>;
    async fn cleanup(&self) -> Result<()>;
    async fn check(&self) -> Result<()>;
    async fn get_jailer_root(&self) -> Result<String>;
    async fn save_state(&self) -> Result<HypervisorState>;
    async fn capabilities(&self) -> Result<Capabilities>;
    async fn get_hypervisor_metrics(&self) -> Result<String>;
}

practice

build

make
./target/debug/containerd-shim-kata-v2 --version