概述
runtime-rs 是 kata-containers 对Go版本的重写,本文主要考虑到Rust的可读性更好,从rs版本可以更好地了解runtime的实现。时间紧张,不做深入的研究。
details
shim-v2
bin
- argo位于
src/runtime-rs/crates/shim/Cargo.toml - rpc的实现使用代理模式+过程宏,见
src/runtime-rs/crates/service/src/task_service.rs
[[bin]]
name = "containerd-shim-kata-v2"
path = "src/bin/main.rs"
api
#[async_trait]
pub trait ContainerManager: Send + Sync {
// container lifecycle
async fn create_container(&self, config: ContainerConfig, spec: oci::Spec) -> Result<PID>;
async fn pause_container(&self, container_id: &ContainerID) -> Result<()>;
async fn resume_container(&self, container_id: &ContainerID) -> Result<()>;
async fn stats_container(&self, container_id: &ContainerID) -> Result<StatsInfo>;
async fn update_container(&self, req: UpdateRequest) -> Result<()>;
async fn connect_container(&self, container_id: &ContainerID) -> Result<PID>;
// process lifecycle
async fn close_process_io(&self, process_id: &ContainerProcess) -> Result<()>;
async fn delete_process(&self, process_id: &ContainerProcess) -> Result<ProcessStateInfo>;
async fn exec_process(&self, req: ExecProcessRequest) -> Result<()>;
async fn kill_process(&self, req: &KillRequest) -> Result<()>;
async fn resize_process_pty(&self, req: &ResizePTYRequest) -> Result<()>;
async fn start_process(&self, process_id: &ContainerProcess) -> Result<PID>;
async fn state_process(&self, process_id: &ContainerProcess) -> Result<ProcessStateInfo>;
async fn wait_process(&self, process_id: &ContainerProcess) -> Result<ProcessExitStatus>;
// utility
async fn pid(&self) -> Result<PID>;
async fn need_shutdown_sandbox(&self, req: &ShutdownRequest) -> bool;
async fn is_sandbox_container(&self, process_id: &ContainerProcess) -> bool;
}
启动流程
- main: real_main -> shim.run
- service_manager: ShimExecutor -> do_run -> get_server_fd -> service::ServiceManager::new
- server: service_manager.run -> self.task_server.as_mut() -> t.start()
VirtContainerManager
- VirtSandox 和 VirtContainerManager 共同构成了 RuntimeInstance(new_instance)
- RuntimeInstance是在RuntimeHandlerManagerInner的init_runtime_handler中被创建的
pub struct VirtContainerManager {
sid: String,
pid: u32,
containers: Arc<RwLock<HashMap<String, Container>>>,
resource_manager: Arc<ResourceManager>,
agent: Arc<dyn Agent>,
hypervisor: Arc<dyn Hypervisor>,
}
Container
- Container 是 oci::Spec 的实体化,被VirtContainerManager所管理
pub async fn create(&self, mut spec: oci::Spec) -> Result<()> {
// get mutable root from oci spec
let root = match spec.root.as_mut() {
Some(root) => root,
None => return Err(anyhow!("spec miss root field")),
};
// handler rootfs
let rootfs = self
.resource_manager
.handler_rootfs(
&config.container_id,
root,
&config.bundle,
&config.rootfs_mounts,
)
.await
.context("handler rootfs")?;
// update rootfs
root.path = rootfs
.get_guest_rootfs_path()
.await
.context("get guest rootfs path")?;
// handler volumes
let volumes = self
.resource_manager
.handler_volumes(&config.container_id, &spec)
.await
.context("handler volumes")?;
let mut oci_mounts = vec![];
spec.mounts = oci_mounts;
// create container
let r = agent::CreateContainerRequest {
process_id: agent::ContainerProcessID::new(&config.container_id, ""),
storages,
oci: Some(spec),
sandbox_pidns,
devices: devices_agent,
..Default::default()
};
self.agent
.create_container(r)
.await
.context("agent create container")?;
self.resource_manager.dump().await;
Ok(())
}
// 实际处理oci mount
pub async fn handler_volumes(
&self,
share_fs: &Option<Arc<dyn ShareFs>>,
cid: &str,
spec: &oci::Spec,
d: &RwLock<DeviceManager>,
sid: &str,
agent: Arc<dyn Agent>,
) -> Result<Vec<Arc<dyn Volume>>> {
}
mount/storage
- OCI标准定义的是mounts,每个mount有destination,type,source,options这几个字段
- storage是和agent的格式,有driver,driver_options,source,fs_type,fs_group,options,mount_point这几个字段
ShimExecutor/ServiceManager
pub struct ShimExecutor {
pub(crate) args: Args,
}
pub struct ServiceManager {
receiver: Option<Receiver<Message>>,
handler: Arc<RuntimeHandlerManager>,
task_server: Option<Server>,
binary: String,
address: String,
namespace: String,
}
RuntimeHandlerManager/RuntimeHandlerManagerInner
- RuntimeHandlerManager 是shim-v2的实际实现和执行者,比如它的核心方法
pub async fn handler_message(&self, req: Request) -> Result<Response>
pub struct RuntimeHandlerManager {
inner: Arc<RwLock<RuntimeHandlerManagerInner>>,
}
struct RuntimeHandlerManagerInner {
id: String,
msg_sender: Sender<Message>,
kata_tracer: Arc<Mutex<KataTracer>>,
runtime_instance: Option<Arc<RuntimeInstance>>,
}
kata-agent/VirtSandbox
-
kata-agent也分为外部和Inner,inner的主要作用是get_agent_client,client类型是agent_ttrpc::AgentServiceClient
-
let agent = new_agent(&config).context("new agent")?;
-
let agent = KataAgent::new(agent_config.clone());
pub struct KataAgent {
pub(crate) inner: Arc<RwLock<KataAgentInner>>,
}
pub fn new(config: AgentConfig) -> Self {
KataAgent {
inner: Arc::new(RwLock::new(KataAgentInner {
client: None,
client_fd: -1,
socket_address: "".to_string(),
config,
log_forwarder: LogForwarder::new(),
})),
}
}
pub(crate) struct KataAgentInner {
pub client: Option<Client>,
pub client_fd: RawFd,
pub socket_address: String,
config: AgentConfig,
log_forwarder: LogForwarder,
}
Sandbox/VirtSandbox
- Sandbox在Go中被称为 composed of a set of containers and a runtime environment.
- 总的来说,Sandbox是对整个Environment的抽象,复杂的逻辑会放在他的create/start等方法中
- 比如,runtime_instance.sandbox.start(dns, spec, state, network_env)被RuntimeHandlerManagerInner的init_runtime_handler调用,它的上游是try_init
pub struct VirtSandbox {
sid: String,
msg_sender: Arc<Mutex<Sender<Message>>>,
inner: Arc<RwLock<SandboxInner>>,
resource_manager: Arc<ResourceManager>,
agent: Arc<dyn Agent>,
hypervisor: Arc<dyn Hypervisor>,
monitor: Arc<HealthCheck>,
}
pub trait Sandbox: Send + Sync {
async fn start(
&self,
dns: Vec<String>,
spec: &oci::Spec,
state: &oci::State,
network_env: SandboxNetworkEnv,
) -> Result<()>;
async fn stop(&self) -> Result<()>;
async fn cleanup(&self) -> Result<()>;
async fn shutdown(&self) -> Result<()>;
// utils
async fn set_iptables(&self, is_ipv6: bool, data: Vec<u8>) -> Result<Vec<u8>>;
async fn get_iptables(&self, is_ipv6: bool) -> Result<Vec<u8>>;
async fn direct_volume_stats(&self, volume_path: &str) -> Result<String>;
async fn direct_volume_resize(&self, resize_req: agent::ResizeVolumeRequest) -> Result<()>;
async fn agent_sock(&self) -> Result<String>;
// metrics function
async fn agent_metrics(&self) -> Result<String>;
async fn hypervisor_metrics(&self) -> Result<String>;
}
Hypervisor
- hypervisor的的抽象方式和Go类似,每种hypervisor都实现trait
pub trait Hypervisor: std::fmt::Debug + Send + Sync {
// vm manager
async fn prepare_vm(&self, id: &str, netns: Option<String>) -> Result<()>;
async fn start_vm(&self, timeout: i32) -> Result<()>;
async fn stop_vm(&self) -> Result<()>;
async fn pause_vm(&self) -> Result<()>;
async fn save_vm(&self) -> Result<()>;
async fn resume_vm(&self) -> Result<()>;
async fn resize_vcpu(&self, old_vcpus: u32, new_vcpus: u32) -> Result<(u32, u32)>; // returns (old_vcpus, new_vcpus)
// device manager
async fn add_device(&self, device: DeviceType) -> Result<()>;
async fn remove_device(&self, device: DeviceType) -> Result<()>;
// utils
async fn get_agent_socket(&self) -> Result<String>;
async fn disconnect(&self);
async fn hypervisor_config(&self) -> HypervisorConfig;
async fn get_thread_ids(&self) -> Result<VcpuThreadIds>;
async fn get_pids(&self) -> Result<Vec<u32>>;
async fn get_vmm_master_tid(&self) -> Result<u32>;
async fn get_ns_path(&self) -> Result<String>;
async fn cleanup(&self) -> Result<()>;
async fn check(&self) -> Result<()>;
async fn get_jailer_root(&self) -> Result<String>;
async fn save_state(&self) -> Result<HypervisorState>;
async fn capabilities(&self) -> Result<Capabilities>;
async fn get_hypervisor_metrics(&self) -> Result<String>;
}
practice
build
make
./target/debug/containerd-shim-kata-v2 --version