harbor垃圾回收GC代码阅读
问题描述
当执行GC时,删除到某个文件的时候,报错了。
报错日志:
delete the manifest with registry v2 API: imi-inteleyes-ehospital/ncdos-health-archives, application/vnd.docker.distribution.manifest.v2+json, sha256:3ee45f7ae3f09f1a9671eb103fa8ec3ac6b4bec51f67947d1a494890805459c4
2021-08-16T00:07:31Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:271]: delete manifest from storage: sha256:3ee45f7ae3f09f1a9671eb103fa8ec3ac6b4bec51f67947d1a494890805459c4
2021-08-16T00:07:31Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:283]: delete artifact trash record from database: 43266, imi-inteleyes-ehospital/ncdos-health-archives, sha256:3ee45f7ae3f09f1a9671eb103fa8ec3ac6b4bec51f67947d1a494890805459c4
2021-08-16T00:07:31Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:295]: delete blob from storage: sha256:3ee45f7ae3f09f1a9671eb103fa8ec3ac6b4bec51f67947d1a494890805459c4
2021-08-16T00:07:31Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:165]: failed to execute GC job at sweep phase, error: failed to delete blob from storage: sha256:3ee45f7ae3f09f1a9671eb103fa8ec3ac6b4bec51f67947d1a494890805459c4, deletefailed: http status code: 500, body: {"errors":[{"code":"UNKNOWN","message":"internal server error"}]}
delete blob from storage xxx 这句报错了
源文件
源文件:\src\jobservice\job\impl\gc\garbage_collection.go
1.先看执行了删除接口做了什么
2.再看GC定时任务的执行逻辑
调用接口删除tag
api; /api/v2.0/projects/ + projectName + /repositories/ + imagesName + /artifacts/ + digest
GC运行
registry API(--delete-untagged=false)?
clean keys of redis DB of registry, clean artifact trash and untagged from DB
1.why disable delete untagged when to call registry API?
Generally because that we introduce Harbor tag in v2.0, it's in database but no corresponding data in registry.
Also one failure case example:
there are two parts for putting an manifest in Harbor: write database and write storage, but they're not in a transaction,
which leads to the data mismatching in parallel pushing images with same tag but different digest. The valid artifact in
harbor DB could be a untagged one in the storage. If we enable the delete untagged, the valid data could be removed from the storage.
2.what to be cleaned?
the deleted artifact, bases on table of artifact_trash and artifact
the untagged artifact(optional), bases on table of artifact.
call removeUntaggedBlobs to remove these non-used blobs from tables project_blob firstly.
garbage_collection.Run() {
//mark
gc.mark()
//sweep
gc.sweeep()
gc.cleanCache()
}
//mark表记blobs的状态为删除
func mark() {
//查询被标记删除的在表artifact_trash中
gc.deletedArt()
//目前上产的harbor没有执行,都是有tag的镜像
gc.removeUntaggedBlobs()
blobs = gc.blobMgr.UselessBlobs()
}
//deleteArt contains the two parts of artifact
//1. required part, the artifacts were removed from harbor
//2. optional part, the untagged artifacts
func deleteArt() {
// delete untagged
//看实际日志,这个函数没有执行
gc.artCtl.Delete()
//filter get all of deleted artifact(artifact_trash), here do not need time windows as the manifest candidate has to remove all of its reference.
arts, err := go.artrashMgr.Filter()
}
步骤
1.通过查询artifact_trash被标记删除的记录
2.blob, project_blob关联查询,project_blob中不存在的记录
//delete the manifest with registry v2 api:
//delete manifest from storage : xxxxx
// delete artifact trash record from database
func sweep() {
// 对blobs进行循环,操作单个blob
//更新状态为删除中
count, err := gc.blobMgr.UpdateBlobStatus()
//remove tags and revisions of a manifest
//blob被标记为delete的记录,是否包含在artifact_trash中,并且blob记录是Manifest
if _, exist := gc.trashedArts[blob.Digest]; exist && blob.IsManifest() {
for _, art := range gc.trashedArts[blob.Digest] {
// delete the manifest with registry v2 API
// delete manifest from storage xx
baseUrl/api/registry/repositoryName/manifests/digest
// delete artifact trash record from database xx
}
}
if !blob.IsForeignLayer() {
// delete blob from storage xxx
}
// delete blob record from database
// The GC job actual frees up xxx MB space
}
// IsManifest returns true if the blob is manifest layer
func (b *Blob) IsManifest() bool {
return b.ContentType == schema2.MediaTypeManifest ||
b.ContentType == schema1.MediaTypeManifest || b.ContentType == schema1.MediaTypeSignedManifest ||
b.ContentType == v1.MediaTypeImageManifest || b.ContentType == v1.MediaTypeImageIndex ||
b.ContentType == manifestlist.MediaTypeManifestList
}
// IsForeignLayer returns true if the blob is foreign layer
func (b *Blob) IsForeignLayer() bool {
return b.ContentType == schema2.MediaTypeForeignLayer
//MediaTypeForeignLayer = "application/vnd.docker.image.rootfs.foreign.diff.tar.gzip"
}
问题
select count(1) from blob where status = 'delete' and content_type != 'application/vnd.docker.image.rootfs.foreign.diff.tar.gzip';
select sum(size)/1024/1024/1024 from blob where status = 'delete' and content_type != 'application/vnd.docker.image.rootfs.foreign.diff.tar.gzip';
select count(1) from blob where status = 'delete';
select sum(size) from blob where status = 'delete';
select count(1) from blob where content_type = 'application/vnd.docker.image.rootfs.foreign.diff.tar.gzip';
foreignLayer是什么
SELECT aft.* FROM artifact_trash AS aft
LEFT JOIN artifact af ON (aft.repository_name=af.repository_name AND aft.digest=af.digest)
WHERE af.digest IS NULL AND af.repository_name IS NULL limit 1;
SELECT b.id, b.digest, b.content_type, b.status, b.version, b.size FROM blob AS b
LEFT JOIN project_blob pb ON b.id = pb.blob_id
WHERE pb.id IS NULL AND b.update_time <= now() - interval '2 hours';
SELECT count(1) FROM blob AS b
LEFT JOIN project_blob pb ON b.id = pb.blob_id
WHERE pb.id IS NULL AND b.update_time <= now() - interval '2 hours';
尝试使用一下再次清理
2021-08-16
ip: 30.147.77.107
大小: 5146G
项目:479私有, 1公开,480个
镜像仓库: 2029私有,共计 2029
select * from blob where status = 'delete' and content_type != 'application/vnd.docker.image.rootfs.foreign.diff.tar.gzip';
2021-08-16 select count(1) from blob where status = 'delete' and content_type != 'application/vnd.docker.image.rootfs.foreign.diff.tar.gzip';
查询结果: 247921