本地无Docker 环境,Python拉取容器镜像

60 阅读6分钟

在工作的时候,公司的服务器各种限制,被限制了访问的IP。但是需要上传各种docker镜像上去,遇到版本问题,动不动就让我升级下docker容器版本吧。

我的想法:在我自己的工作电脑上面,下载好对应的镜像,然后上传到对应的服务器即可。

既然都是通过网络接口请求的,我想一定有python工具,能够直接跟dockerhub打交道,不需 Docker 客户端的支持。通过查找,github有个项目:docker-drag,start数量860个,还是很优秀的。地址:github.com/NotGlop/doc… 。里面就是想要的一个脚本,不过这个start很多的脚本我使用的时候有问题,下载报错KeyError: 'layers',不过还是找到了其他仓库解决了我遇到的问题,能够继续愉快使用,我能够成功使用的代码,我放在了最下面。

使用方式

将最后面的python代码,保存为docker_pull.py文件,文件名字随便,然后cmd执行python docker_pull.py mysql:5.7,即可愉快地下载镜像了。

使用的命令,与docke pull命令使用方法一致。其主要是借助 Python 的 Request 库和 HTTPS API 直接从仓库中拉取镜像,并保存为 TAR 文件。

PS:不管怎样,你的工作电脑先要开启魔法,能够正常访问dockerhub,才能使用哈。

# 下载方式:Pull a Docker image in HTTPS
python docker_pull.py hello-world
python docker_pull.py jenkins/jenkins
python docker_pull.py jenkins/jenkins:latest-jdk17
python docker_pull.py mysql/mysql-server:8.0
python docker_pull.py mcr.microsoft.com/mssql-tools
python docker_pull.py consul@sha256:6ba4bfe1449ad8ac5a76cb29b6c3ff54489477a23786afb61ae30fb3b1ac0ae9

image.png

下载结果:

image.png

当下载好镜像后,把镜像上传到对应的服务器上,就可以继续愉快地使用容器部署服务咯。

docker load -i mysql_mysql-server.tar

最终脚本

import gzip
import hashlib
import json
import os
import shutil
import sys
import tarfile

import requests
import urllib3
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

urllib3.disable_warnings()


def create_session():
    session = requests.Session()
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    # Check if proxy environment variables are set
    http_proxy = os.environ.get('HTTP_PROXY') or os.environ.get('http_proxy')
    https_proxy = os.environ.get('HTTPS_PROXY') or os.environ.get('https_proxy')

    if http_proxy or https_proxy:
        session.proxies = {
            'http': http_proxy,
            'https': https_proxy
        }
        print('[+] Using proxy settings from environment')

    return session


if len(sys.argv) != 2:
    print('Usage:\n\tdocker_pull.py [registry/][repository/]image[:tag|@digest]\n')
    exit(1)

# Create a session for all requests
session = create_session()

# Look for the Docker image to download
repo = 'library'
tag = 'latest'
imgparts = sys.argv[1].split('/')
try:
    img, tag = imgparts[-1].split('@')
except ValueError:
    try:
        img, tag = imgparts[-1].split(':')
    except ValueError:
        img = imgparts[-1]
# Docker client doesn't seem to consider the first element as a potential registry unless there is a '.' or ':'
if len(imgparts) > 1 and ('.' in imgparts[0] or ':' in imgparts[0]):
    registry = imgparts[0]
    repo = '/'.join(imgparts[1:-1])
else:
    registry = 'registry-1.docker.io'
    if len(imgparts[:-1]) != 0:
        repo = '/'.join(imgparts[:-1])
    else:
        repo = 'library'
repository = '{}/{}'.format(repo, img)

# Get Docker authentication endpoint when it is required
auth_url = 'https://auth.docker.io/token'
reg_service = 'registry.docker.io'

try:
    print('[+] Connecting to registry: {}'.format(registry))
    resp = session.get('https://{}/v2/'.format(registry), verify=False, timeout=30)
    if resp.status_code == 401:
        auth_url = resp.headers['WWW-Authenticate'].split('"')[1]
        try:
            reg_service = resp.headers['WWW-Authenticate'].split('"')[3]
        except IndexError:
            reg_service = ""
except requests.exceptions.RequestException as e:
    print('[-] Connection error:', str(e))
    print('[*] Troubleshooting tips:')
    print('    1. Check your internet connection')
    print('    2. If you are behind a proxy, set HTTP_PROXY and HTTPS_PROXY environment variables')
    print('    3. Try using a VPN if the registry is blocked')
    print('    4. Verify if the registry {} is accessible from your network'.format(registry))
    exit(1)


# Get Docker token (this function is useless for unauthenticated registries like Microsoft)
def get_auth_head(type):
    try:
        resp = session.get('{}?service={}&scope=repository:{}:pull'.format(auth_url, reg_service, repository),
                           verify=False, timeout=30)
        access_token = resp.json()['token']
        auth_head = {'Authorization': 'Bearer ' + access_token, 'Accept': type}
        return auth_head
    except requests.exceptions.RequestException as e:
        print('[-] Authentication error:', str(e))
        exit(1)


# Docker style progress bar
def progress_bar(ublob, nb_traits):
    sys.stdout.write('\r' + ublob[7:19] + ': Downloading [')
    for i in range(0, nb_traits):
        if i == nb_traits - 1:
            sys.stdout.write('>')
        else:
            sys.stdout.write('=')
    for i in range(0, 49 - nb_traits):
        sys.stdout.write(' ')
    sys.stdout.write(']')
    sys.stdout.flush()


# Fetch manifest v2 and get image layer digests
print('[+] Trying to fetch manifest for {}'.format(repository))
auth_head = get_auth_head(
    'application/vnd.docker.distribution.manifest.v2+json,application/vnd.docker.distribution.manifest.list.v2+json')
try:
    resp = session.get('https://{}/v2/{}/manifests/{}'.format(registry, repository, tag), headers=auth_head,
                       verify=False, timeout=30)
except requests.exceptions.RequestException as e:
    print('[-] Manifest fetch error:', str(e))
    exit(1)
print('[+] Response status code:', resp.status_code)
print('[+] Response headers:', resp.headers)

if (resp.status_code != 200):
    print('[-] Cannot fetch manifest for {} [HTTP {}]'.format(repository, resp.status_code))
    print(resp.content)
    exit(1)

content_type = resp.headers.get('content-type', '')
print('[+] Content type:', content_type)

try:
    resp_json = resp.json()
    print('[+] Response JSON structure:')
    print(json.dumps(resp_json, indent=2))

    # Handle manifest list (multi-arch images)
    if 'manifests' in resp_json:
        print('[+] This is a multi-arch image. Available platforms:')
        for m in resp_json['manifests']:
            if 'platform' in m:
                print('    - {}/{} ({})'.format(
                    m['platform'].get('os', 'unknown'),
                    m['platform'].get('architecture', 'unknown'),
                    m['digest']
                ))

        # Try to find linux/amd64 platform first, then fall back to windows/amd64
        selected_manifest = None
        for m in resp_json['manifests']:
            platform = m.get('platform', {})
            if platform.get('os') == 'linux' and platform.get('architecture') == 'amd64':
                selected_manifest = m
                break

        if not selected_manifest:
            for m in resp_json['manifests']:
                platform = m.get('platform', {})
                if platform.get('os') == 'windows' and platform.get('architecture') == 'amd64':
                    selected_manifest = m
                    break

        if not selected_manifest:
            # If no preferred platform found, use the first one
            selected_manifest = resp_json['manifests'][0]

        print('[+] Selected platform: {}/{}'.format(
            selected_manifest['platform'].get('os', 'unknown'),
            selected_manifest['platform'].get('architecture', 'unknown')
        ))

        # Fetch the specific manifest
        try:
            # Get fresh auth token for manifest
            manifest_auth_head = get_auth_head('application/vnd.docker.distribution.manifest.v2+json')
            manifest_resp = session.get(
                'https://{}/v2/{}/manifests/{}'.format(registry, repository, selected_manifest['digest']),
                headers=manifest_auth_head,  # 使用新的认证头
                verify=False,
                timeout=30
            )
            if manifest_resp.status_code != 200:
                print('[-] Failed to fetch specific manifest:', manifest_resp.status_code)
                print('[-] Response content:', manifest_resp.content)
                exit(1)
            resp_json = manifest_resp.json()
            print('[+] Successfully fetched specific manifest')
        except Exception as e:
            print('[-] Error fetching specific manifest:', e)
            exit(1)

    # Now we should have the actual manifest with layers
    if 'layers' not in resp_json:
        print('[-] Error: No layers found in manifest')
        print('[-] Available keys:', list(resp_json.keys()))
        exit(1)

    layers = resp_json['layers']

except KeyError as e:
    print('[-] Error: Could not find required key in response:', e)
    print('[-] Available keys:', list(resp_json.keys()))
    exit(1)
except Exception as e:
    print('[-] Unexpected error:', e)
    exit(1)

# Create tmp directory if it doesn't exist
imgdir = 'tmp'
if not os.path.exists(imgdir):
    print('[+] Creating temporary directory:', imgdir)
    os.makedirs(imgdir)

config = resp_json['config']['digest']
try:
    confresp = session.get('https://{}/v2/{}/blobs/{}'.format(registry, repository, config), headers=auth_head,
                           verify=False, timeout=30)
except requests.exceptions.RequestException as e:
    print('[-] Config fetch error:', str(e))
    exit(1)
file = open('{}/{}.json'.format(imgdir, config[7:]), 'wb')
file.write(confresp.content)
file.close()

content = [{
    'Config': config[7:] + '.json',
    'RepoTags': [],
    'Layers': []
}]
if len(imgparts[:-1]) != 0:
    content[0]['RepoTags'].append('/'.join(imgparts[:-1]) + '/' + img + ':' + tag)
else:
    content[0]['RepoTags'].append(img + ':' + tag)

empty_json = '{"created":"1970-01-01T00:00:00Z","container_config":{"Hostname":"","Domainname":"","User":"","AttachStdin":false, \
	"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false, "StdinOnce":false,"Env":null,"Cmd":null,"Image":"", \
	"Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":null}}'

# Build layer folders
parentid = ''
for layer in layers:
    ublob = layer['digest']
    # FIXME: Creating fake layer ID. Don't know how Docker generates it
    fake_layerid = hashlib.sha256((parentid + '\n' + ublob + '\n').encode('utf-8')).hexdigest()
    layerdir = imgdir + '/' + fake_layerid
    os.mkdir(layerdir)

    # Creating VERSION file
    file = open(layerdir + '/VERSION', 'w')
    file.write('1.0')
    file.close()

    # Creating layer.tar file
    sys.stdout.write(ublob[7:19] + ': Downloading...')
    sys.stdout.flush()
    auth_head = get_auth_head(
        'application/vnd.docker.distribution.manifest.v2+json')  # refreshing token to avoid its expiration
    try:
        bresp = session.get('https://{}/v2/{}/blobs/{}'.format(registry, repository, ublob), headers=auth_head,
                            stream=True, verify=False, timeout=30)
    except requests.exceptions.RequestException as e:
        print('[-] Layer fetch error:', str(e))
        exit(1)
    if (bresp.status_code != 200):  # When the layer is located at a custom URL
        try:
            bresp = session.get(layer['urls'][0], headers=auth_head, stream=True, verify=False, timeout=30)
        except requests.exceptions.RequestException as e:
            print('[-] Layer fetch error:', str(e))
            exit(1)
        if (bresp.status_code != 200):
            print('\rERROR: Cannot download layer {} [HTTP {}]'.format(ublob[7:19], bresp.status_code,
                                                                       bresp.headers['Content-Length']))
            print(bresp.content)
            exit(1)
    # Stream download and follow the progress
    bresp.raise_for_status()
    unit = int(bresp.headers['Content-Length']) / 50
    acc = 0
    nb_traits = 0
    progress_bar(ublob, nb_traits)
    with open(layerdir + '/layer_gzip.tar', "wb") as file:
        for chunk in bresp.iter_content(chunk_size=8192):
            if chunk:
                file.write(chunk)
                acc = acc + 8192
                if acc > unit:
                    nb_traits = nb_traits + 1
                    progress_bar(ublob, nb_traits)
                    acc = 0
    sys.stdout.write("\r{}: Extracting...{}".format(ublob[7:19], " " * 50))  # Ugly but works everywhere
    sys.stdout.flush()
    with open(layerdir + '/layer.tar', "wb") as file:  # Decompress gzip response
        unzLayer = gzip.open(layerdir + '/layer_gzip.tar', 'rb')
        shutil.copyfileobj(unzLayer, file)
        unzLayer.close()
    os.remove(layerdir + '/layer_gzip.tar')
    print("\r{}: Pull complete [{}]".format(ublob[7:19], bresp.headers['Content-Length']))
    content[0]['Layers'].append(fake_layerid + '/layer.tar')

    # Creating json file
    file = open(layerdir + '/json', 'w')
    # last layer = config manifest - history - rootfs
    if layers[-1]['digest'] == layer['digest']:
        # FIXME: json.loads() automatically converts to unicode, thus decoding values whereas Docker doesn't
        json_obj = json.loads(confresp.content)
        del json_obj['history']
        try:
            del json_obj['rootfs']
        except:  # Because Microsoft loves case insensitiveness
            del json_obj['rootfS']
    else:  # other layers json are empty
        json_obj = json.loads(empty_json)
    json_obj['id'] = fake_layerid
    if parentid:
        json_obj['parent'] = parentid
    parentid = json_obj['id']
    file.write(json.dumps(json_obj))
    file.close()

file = open(imgdir + '/manifest.json', 'w')
file.write(json.dumps(content))
file.close()

if len(imgparts[:-1]) != 0:
    content = {'/'.join(imgparts[:-1]) + '/' + img: {tag: fake_layerid}}
else:  # when pulling only an img (without repo and registry)
    content = {img: {tag: fake_layerid}}
file = open(imgdir + '/repositories', 'w')
file.write(json.dumps(content))
file.close()

# Create image tar and clean tmp folder
docker_tar = repo.replace('/', '_') + '_' + img + '.tar'
sys.stdout.write("Creating archive...")
sys.stdout.flush()
tar = tarfile.open(docker_tar, "w")
tar.add(imgdir, arcname=os.path.sep)
tar.close()
shutil.rmtree(imgdir)
print('\rDocker image pulled: ' + docker_tar)