使用自定义snapshotter修改容器的rootfs路径容器环境下，当我们想实时保存容器的内容，避免因容器运行过程中意

问题背景

在docker/containerd中，每个容器都有一个rootfs。rootfs (root filesystem) 本质上就是一个 目录结构，它包含了 Linux 系统运行所需的最基本的文件和目录（比如 /bin、/etc、/lib、/usr 等）。在容器中，容器共享宿主机的 Linux 内核，但自己有独立的 rootfs。

当你使用docker/containerd运行容器时，需要先拉取镜像，这个镜像就是一个 rootfs。（里面有镜像的用户空间的所有文件，但没有内核）。
当容器启动时，它把这个 rootfs 挂载到新的进程命名空间里，让进程“以为”自己在一台独立的系统上。

rootfs 特点

是容器运行的最小根文件系统。
不能直接写死（因为多个容器会共享同一个镜像）。
一般只读，真正的“写操作”通过 Union/Stackable Filesystem（联合文件系统） 来完成。

Union/Stackable Filesystem（联合文件系统）

联合文件系统（Union FS）是为了解决容器的这些问题而提出的：1、镜像是只读的；2、容器运行时需要一个可写层；3、多个容器共享相同的镜像层，节省存储空间；4、文件系统需要高效。当前有需要的Union FS，如：aufs、unionfs、overlayfs、btrfs、ZFS等等，其中使用最多，也是containerd默认的Union FS是overlayfs。

overlayfs

overlayfs的原理是将多个目录（称为层，layer）合并成一个统一的视图，通常氛围lowerdir和upperdir。lowerdir（只读）通常是镜像层，比如 ubuntu:20.04 镜像里的 rootfs。upperdir（可写）：容器运行时的可写层，保存修改后的内容。合并之后，用户看到的是一个统一的目录树（merged）。当容器读取一个文件时：先看 upperdir 有没有这个文件；没有就去 lowerdir 找。
当容器写入一个文件时：如果文件在 lowerdir 里，它会先 copy-up（复制到 upperdir），然后修改 upperdir 里的文件；lowerdir 仍然保持只读。

overlayfs 的四个关键目录

一个标准 overlayfs 挂载需要：lowerdir（镜像只读层）；upperdir（容器写时层）；workdir（内核需要的工作目录（做 copy-up 等操作时使用））；merged（最终挂载点，容器看到的文件系统）。容器启动后，lowerdir、upperdir、workdir、merged一般是由overlayfs自动设置为：{containerd配置的root目录}/io.containerd.snapshotter.v1.overlayfs/snapshots/{上层layer的snapshotKey}/fs、
{containerd配置的root目录}/io.containerd.snapshotter.v1.overlayfs/snapshots/{当前层layer的snapshotKey}/fs、
{containerd配置的root目录}/io.containerd.snapshotter.v1.overlayfs/snapshots/{上层layer的snapshotKey}/work、
/run/containerd/runc/{namespace}/{container id}
那么问题来了，当我们想实时保存可写层的内容，避免因容器运行过程中意外宕机，而导致未及时commit的数据丢失，要如何做呢？
有一个简单的思路是：我们将某个路径Mount到某个外部存储系统（比如nfs等等），然后将容器的可写层upperdir修改为该路径，那么就可以实时将容器的数据及时保存到外部存储。所以，下面就介绍一下如何修改upperdir路径。

修改overlayfs的upperdir

要修改overlayfs的upperdir有两种实现路径：1、修改containerd源码，将其中的overlayfs设置upperdir部分按自己的逻辑修改为自己想要的目录。2、containerd支持使用自定义snapshotter插件，使用自定义的snapshotter插件去修改。相对于方案1，方案2改动更小，也更易于维护。下面将介绍自定义snapshotter的实现。

自定义snapshotter

自定义snapshotter继承自github.com/containerd/containerd/snapshots ，包含这些方法：

1. Prepare

接口：snapshotter.Prepare(ctx, key, parent, opts...)
时机：容器启动之前
作用：为该容器创建一个临时的 writable snapshot，基于只读镜像层（parent）。
返回：一组 mounts，containerd 将用于挂载容器的 rootfs。
特点：
- 创建目录结构（upperdir、workdir、merged 等）。
- 会记录 snapshot 的 metadata（例如 labels）。这个阶段可以做：
判断是否是目标 Pod/容器（通过 key、labels）
替换 upperdir 为你自己的挂载路径（如 EBS 卷）

2. Mounts

接口：snapshotter.Mounts(ctx, key)
时机：容器真正启动之前（mount 时）
作用：返回之前通过 Prepare 创建的 snapshot 的 mount 信息（overlayfs 配置等）。
通常直接返回 Prepare 返回的 mount。
特点：
- 一般只读，不会改变 snapshot 状态。
- 如果你在 Prepare 中写错了 mount，这里会复用它。

3. Commit

接口：snapshotter.Commit(ctx, name, key, opts...)
时机：镜像构建/缓存场景使用（例如 buildkit 或镜像 pull）
作用：把 Prepare 的临时 snapshot 转为 只读层。
特点：
- 通常容器运行不会触发 Commit，除非是做镜像 layer 的构建。一般 Pod 创建 不会触发 Commit。

4. Remove

接口：snapshotter.Remove(ctx, key)
时机：容器被删除时（可能是容器退出，也可能是 Pod 删除）
作用：删除 snapshot 相关的元数据和实际数据目录。
特点：
- 如果你用了自定义 upperdir（例如挂载了 EBS），这里你可以选择保留文件（比如不要删除目录）。

5. Stat

接口：snapshotter.Stat(ctx, key)
时机：containerd 需要确认 snapshot 状态时调用
作用：返回 snapshot 的元信息，如是否存在、是否已提交。
通常是辅助性调用，影响不大。

6. Walk

接口：snapshotter.Walk(ctx, func(info Info) error)
时机：containerd 启动或清理时使用。
作用：遍历所有 snapshots，做健康检查或清理。
用于垃圾回收、状态恢复等。以时间轴表示 snapshotter 方法生命周期

Pod 创建时： ┌────────────────────────────┐ │ kubelet 请求创建容器 │ │ containerd 调用 snapshotter│ └────────────────────────────┘ ↓ Prepare() <-- 创建可写快照（upper/work/fs），返回 mount ↓ Mounts() <-- 获取上一步的挂载信息，挂载到容器中 ↓ Container 启动 ↓ Pod 删除或容器退出 ↓ Remove() <-- 删除 snapshot 内容（或者保留 upperdir）

我们的需求很简单，只是修改upperdir，因此，除了Prepare()和Mount()方法，其他方法可以直接调用overlayfs。

实现步骤

我们定义一个snapshotter结构体。

type customSnapshotter struct {
	base snapshots.Snapshotter
	root string
}

然后实现 github.com/containerd/containerd/snapshots 的其他方法

func (s *customSnapshotter) Stat(ctx context.Context, key string) (snapshots.Info, error) {
	return s.base.Stat(ctx, key)
}

func (s *customSnapshotter) Update(ctx context.Context, info snapshots.Info, fieldpaths ...string) (snapshots.Info, error) {
	return s.base.Update(ctx, info, fieldpaths...)
}

func (s *customSnapshotter) Remove(ctx context.Context, key string) error {
	return s.base.Remove(ctx, key)
}

func (s *customSnapshotter) Close() error {
	return s.base.Close()
}

func (s *customSnapshotter) Walk(ctx context.Context, fn snapshots.WalkFunc, filters ...string) error {
	return s.base.Walk(ctx, fn, filters...)
}

func (s *customSnapshotter) Usage(ctx context.Context, key string) (snapshots.Usage, error) {
	return s.base.Usage(ctx, key)
}

func (s *customSnapshotter) Commit(ctx context.Context, name, key string, opts ...snapshots.Opt) error {
	return s.base.Commit(ctx, name, key, opts...)
}

func (s *customSnapshotter) View(ctx context.Context, key, parent string, opts ...snapshots.Opt) ([]mount.Mount, error) {
	return s.base.View(ctx, key, parent, opts...)
}

对于我们的场景，上面这些方法都不用修改，直接调用overlayfs的对应方法即可。我们需要修改的是Prepare()和Mount()方法。

Prepare方法

func (s *customSnapshotter) Prepare(ctx context.Context, key, parent string, opts ...snapshots.Opt) ([]mount.Mount, error) {
        // 可以打印看看containerd启动容器时，这个方法被调用时都传了些啥
	fmt.Println("key:", key)  // 当前容器的snapshotter Key
	fmt.Println("parent:", parent)  // 上一层layer的snapshotter key 
	fmt.Println("opts:", opts) // containerd透传过来的其他参数

	if opts != nil {
		getPodUIDFromOpts(opts)
	}

	snapIndex := extractContainerID(key)[1]
	snapKeyInt, _ := strconv.Atoi(snapIndex)

	if index == 0 || index == snapKeyInt {
		fmt.Println("index: ", index)
		index = snapKeyInt
		return s.base.Prepare(ctx, key, parent, opts...)
	}

	cmd := exec.Command("lsblk", "-J", "-o", "NAME,MOUNTPOINT,TYPE")
	output, err := cmd.Output()
	if err != nil {
		panic(fmt.Errorf("failed to execute lsblk: %v", err))
	}

	// 解析 JSON
	var lsblk LsblkOutput
	if err := json.Unmarshal(output, &lsblk); err != nil {
		panic(fmt.Errorf("failed to parse lsblk output: %v", err))
	}

	// 这里从本机获取挂载点作为rootfs的目录
	devMounts := findKubeMountpoints(lsblk.BlockDevices)
	if len(devMounts) == 0 {
		fmt.Println("no ebs mount")
		return s.base.Prepare(ctx, key, parent, opts...)
	}
	for _, m := range devMounts {
		fmt.Println(m)
	}

	// 简单策略：取第一个 volume（可根据 label 指定卷名）
	upper := devMounts[0]
	work := filepath.Join(upper, "work")
	fs := filepath.Join(upper, "fs")

	log.G(ctx).Infof("Using upperdir: %s", upper)
	log.G(ctx).Infof("Using workdir: %s", work)
	log.G(ctx).Infof("Using fsdir: %s", fs)

	if err := os.MkdirAll(work, 0755); err != nil {
		return nil, fmt.Errorf("failed to create workdir: %w", err)
	}
	if err := os.MkdirAll(fs, 0755); err != nil {
		return nil, fmt.Errorf("failed to create fsdir: %w", err)
	}

	mounts, err := s.base.Prepare(ctx, key, parent, opts...)
	if err != nil {
		return nil, err
	}
	customMounts := overrideMountPaths(mounts, fs, work)

	fmt.Println("final mounts:", customMounts)

	return customMounts, nil
}
// 拿到containerdId
func extractContainerID(key string) []string {
	parts := strings.Split(key, "/")
	if len(parts) >= 3 {
		return parts
	}
	return []string{}
}
// 纯好奇，看看containerd会传什么参数过来
func getPodUIDFromOpts(opts []snapshots.Opt) {
	log.G(context.TODO()).Infof("start get pod uid from opts")
	for _, opt := range opts {
		info := &snapshots.Info{}
		opt(info)
		fmt.Println("info:", info)
		fmt.Println("opt:", opt)
		for k, v := range info.Labels {
			log.G(context.TODO()).Infof("snapshot label: %s=%s", k, v)
		}
	}
}

// 递归遍历设备，提取匹配的挂载路径
func findKubeMountpoints(devices []BlockDevice) []string {
	var matches []string
	for _, dev := range devices {
		if dev.Mountpoint != "" && kubeMountRegex.MatchString(dev.Mountpoint) {
			matches = append(matches, dev.Mountpoint)
		}
		if len(dev.Children) > 0 {
			childMatches := findKubeMountpoints(dev.Children)
			matches = append(matches, childMatches...)
		}
	}
	return matches
}
// 修改upperdir、workdir为指定路径
func overrideMountPaths(mounts []mount.Mount, newUpper, newWork string) []mount.Mount {
	var newMounts []mount.Mount

	for _, m := range mounts {
		if m.Type != "overlay" {
			newMounts = append(newMounts, m)
			continue
		}

		var newOptions []string
		for _, opt := range m.Options {
			if strings.HasPrefix(opt, "upperdir=") {
				newOptions = append(newOptions, "upperdir="+newUpper)
			} else if strings.HasPrefix(opt, "workdir=") {
				newOptions = append(newOptions, "workdir="+newWork)
			} else {
				newOptions = append(newOptions, opt)
			}
		}

		m.Options = newOptions
		newMounts = append(newMounts, m)
	}
	return newMounts
}

修改Mount()方法

func (s *customSnapshotter) Mounts(ctx context.Context, key string) ([]mount.Mount, error) {
	snapIndex := extractContainerID(key)[1]
	snapKeyInt, _ := strconv.Atoi(snapIndex)
	if index == 0 || index == snapKeyInt {
		return s.base.Mounts(ctx, key)
	}

	mounts, err := s.base.Mounts(ctx, key)
	if err != nil {
		return nil, err
	}

	cmd := exec.Command("lsblk", "-J", "-o", "NAME,MOUNTPOINT,TYPE")
	output, err := cmd.Output()
	if err != nil {
		panic(fmt.Errorf("failed to execute lsblk: %v", err))
	}

	// 解析 JSON
	var lsblk LsblkOutput
	if err := json.Unmarshal(output, &lsblk); err != nil {
		panic(fmt.Errorf("failed to parse lsblk output: %v", err))
	}

	// 打印 mountpoints
	devMounts := findKubeMountpoints(lsblk.BlockDevices)
	if len(devMounts) == 0 {
		fmt.Println("no ebs mount")
		return s.base.Mounts(ctx, key)
	}

	upper := devMounts[0]
	work := filepath.Join(upper, "work")
	fs := filepath.Join(upper, "fs")

	return overrideMountPaths(mounts, fs, work), nil
}

在main函数中启动一个grpc服务

const (
	Name           = "custom-overlayfs"
	DefaultRootDir = "/data/container/root/io.containerd.snapshotter.v1." + Name
	SocksFileName  = Name + ".sock"
)

func main() {
	app := &cli.App{
		Name:  Name,
		Usage: "Run a custom overlay snapshotter for EBS persistent rootfs",
		Flags: []cli.Flag{
			&cli.StringFlag{
				Name:  "root-dir",
				Value: DefaultRootDir,
				Usage: "Snapshotter root directory",
			},
		},
		Action: func(ctx *cli.Context) error {
			root := ctx.String("root-dir")

			sn, err := snapshotter.NewSnapshotter(root)
			if err != nil {
				return err
			}

			service := snapshotservice.FromSnapshotter(sn)
			rpc := grpc.NewServer()
			snapshotsapi.RegisterSnapshotsServer(rpc, service)

			sock := path.Join(root, SocksFileName)
			if err := os.RemoveAll(sock); err != nil {
				return err
			}
			log.Printf("Starting snapshotter at socket: %s", sock)
			fmt.Println("------ebs-snapshotter-------")
			l, err := net.Listen("unix", sock)
			if err != nil {
				return err
			}
			return rpc.Serve(l)
		},
	}

	if err := app.Run(os.Args); err != nil {
		log.Fatal(err)
	}
}

编译

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -ldflags '-extldflags "-static"' -o ebs-snapshotter ./

运行

chmod +x ebs-snapshotter
./ebs-snapshotter

该插件启动后，会生成/data/container/root/io.containerd.snapshotter.v1.custom-overlayfs/custom-overlayfs.sock文件

部署

1、修改containerd配置文件

vim /etc/containerd/config.toml

2、添加proxy_plugins，设置路径为/data/container/root/io.containerd.snapshotter.v1.custom-overlayfs/custom-overlayfs.sock

3、重启containerd 4、查看containerd日志，就可以看到我们代码中打印的日志了；然后检查我们挂载的路径也可以看到生成了/fs和/work目录