一个因boto支持aws4引发的悲剧故事

605 阅读3分钟

环境介绍

软件版本

root@demo:/home/demouser# ceph -v
ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
boto版本:2.46.1

rgw配置

[client.radosgw.cn-zone1]
     rgw dns name = ceph.work
     rgw frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
     host = demo
     keyring = /etc/ceph/ceph.client.radosgw.keyring
     rgw socket path = /home/ceph/var/run/ceph-client.radosgw.cn-zone1.sock
     log file = /home/ceph/log/radosgw.cn-zone1.log
     rgw print continue = false
     rgw content length compat = true

boto对region支持的一些坑

boto用例

from boto.s3.connection import S3Connection
import boto
import os
os.environ['S3_USE_SIGV4'] = 'True' #启动对aws4的支持

endpoint = 'ceph.work'
bucket_name = 'test1'
access_key = ''
secret_key = ''

conn = boto.connect_s3(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    host=endpoint,
    is_secure=False,
    calling_format=boto.s3.connection.SubdomainCallingFormat(),
    validate_certs=True,
)

bucket = conn.get_all_buckets()
print bucket

异常信息

Traceback (most recent call last):
.....
  File "/Users/demouser/lwc/lib/python2.7/site-packages/boto/auth.py", line 690, in determine_region_name
    return region_name
UnboundLocalError: local variable 'region_name' referenced before assignment

boto对region_name的逻辑处理

    def split_host_parts(self, host):
        return host.split('.') 

    def determine_region_name(self, host):
        # S3's different format(s) of representing region/service from the
        # rest of AWS makes this hurt too.
        #
        # Possible domain formats:
        # - s3.amazonaws.com (Classic)
        # - s3-us-west-2.amazonaws.com (Specific region)
        # - bukkit.s3.amazonaws.com (Vhosted Classic)
        # - bukkit.s3-ap-northeast-1.amazonaws.com (Vhosted specific region)
        # - s3.cn-north-1.amazonaws.com.cn - (Beijing region)
        # - bukkit.s3.cn-north-1.amazonaws.com.cn - (Vhosted Beijing region)
        parts = self.split_host_parts(host)

        if self.region_name is not None:
            region_name = self.region_name
        else:
            # Classic URLs - s3-us-west-2.amazonaws.com
            if len(parts) == 3:
                region_name = self.clean_region_name(parts[0])

                # Special-case for Classic.
                if region_name == 's3':
                    region_name = 'us-east-1' #这里有个坑,下面会讲
            else:
                # Iterate over the parts in reverse order.
                for offset, part in enumerate(reversed(parts)):
                    part = part.lower()

                    # Look for the first thing starting with 's3'.
                    # Until there's a ``.s3`` TLD, we should be OK. :P
                    if part == 's3':
                        # If it's by itself, the region is the previous part.
                        region_name = parts[-offset]

                        # Unless it's Vhosted classic
                        if region_name == 'amazonaws':
                            region_name = 'us-east-1'

                        break
                    elif part.startswith('s3-'):
                        region_name = self.clean_region_name(part)
                        break

        return region_name

走读boto代码,发现其实region_name字段就是将host按"."进行切分,取前面部分进行处理得出,如果host为ceph.work,那么切分出来的"ceph"当然对应不上了,所以想要让boto支持AWS4,你的host必须有3个字段,比如"s3.ceph.work"。如果你采取掩耳盗铃方式,只是去调整boto代码如下

from boto.s3.connection import S3Connection
import boto
import os
os.environ['S3_USE_SIGV4'] = 'True'

endpoint = 's3.ceph.work' #新增一个字段
bucket_name = 'test1'
access_key = ''
secret_key = ''

conn = boto.connect_s3(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    host=endpoint,
    is_secure=False,
    calling_format=boto.s3.connection.SubdomainCallingFormat(),
    validate_certs=True,
)

bucket = conn.get_all_buckets()
print bucket

那么你将看到如下异常

Traceback (most recent call last):
...
  File "/Users/Diluga/lwc/lib/python2.7/site-packages/boto/s3/connection.py", line 444, in get_all_buckets
    response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidBucketName</Code><BucketName>s3</BucketName><RequestId>tx000000000000000000001-0058d4d9ad-85cc-default</RequestId><HostId>85cc-default-default</HostId></Error>

这个400错误,说明你提交数据的姿势不对,

image.png

从抓包来看这个us-east-1就是你提交的region_name了,这里也给我们埋了一个坑,如果我们的host第一个字段是"s3",那么region_name就是被硬编码成了"us-east-1",如果是"s3-abc"或者是其他,则region_name变成我们自定义的字段,好在ceph默认是允许你在request header里面随便填region_name,不然就真的悲剧了。

这里正确的姿势应该是调整ceph.conf里面的rgw配置如下.

方案1

[client.radosgw.cn-zone1]
     rgw dns name = s3.ceph.work #原有host基础上新增一个字段,boto提交的region_name会是"us-east-1"
     rgw frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
     host = demo
     keyring = /etc/ceph/ceph.client.radosgw.keyring
     rgw socket path = /home/ceph/var/run/ceph-client.radosgw.cn-zone1.sock
     log file = /home/ceph/log/radosgw.cn-zone1.log
     rgw print continue = false
     rgw content length compat = true

方案2

[client.radosgw.cn-zone1]
     rgw dns name = s3-abc.ceph.work #原有host基础上新增一个字段,boto提交的region_name会是"abc"
     rgw frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
     host = demo
     keyring = /etc/ceph/ceph.client.radosgw.keyring
     rgw socket path = /home/ceph/var/run/ceph-client.radosgw.cn-zone1.sock
     log file = /home/ceph/log/radosgw.cn-zone1.log
     rgw print continue = false
     rgw content length compat = true

总结

RGW在hammer版本之前,如果只是使用aws2那么rgw的host字段的配置不会那么重要,但是如果你升级到jewel以上并且使用aws4,那么这些问题就会忽然爆发出来,在集群初期规划好一个host就会变得非常重要,不然等到你用新版本上线,会发现这个坑让你不得不修改所有客户端配置。