前言
本文通过剖析实际使用AmazonS3过程中遇到的一个问题,介绍下HTTP header的一些规则,以及一些解决为的思路。
问题
需求
需求本身十分简单,就是将BLOB文件上传到S3,同时需要将文件的诸多业务属性一同添加的S3Object上,便于下载后直接解析。
实现
使用AmazonS3的putObject方法完成文件上传,putObject有四个重载的方法,如下:
public PutObjectResult putObject(PutObjectRequest putObjectRequest)
throws SdkClientException, AmazonServiceException;
public PutObjectResult putObject(String bucketName, String key, File file)
throws SdkClientException, AmazonServiceException;
public PutObjectResult putObject(String bucketName, String key, InputStream input, ObjectMetadata metadata)
throws SdkClientException, AmazonServiceException;
public PutObjectResult putObject(String bucketName, String key, String content)
throws AmazonServiceException, SdkClientException;
我们使用的是上述的第三个方法,完整定义如下:
/**
* <p>
* Uploads the specified input stream and object metadata to Amazon S3 under
* the specified bucket and key name.
* </p>
* <p>
* Amazon S3 never stores partial objects;
* if during this call an exception wasn't thrown,
* the entire object was stored.
* </p>
* <p>
* If you are uploading or accessing <a
* href="http://aws.amazon.com/kms/">AWS KMS</a>-encrypted objects, you need to
* specify the correct region of the bucket on your client and configure AWS
* Signature Version 4 for added security. For more information on how to do
* this, see
* http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html#
* specify-signature-version
* </p>
* <p>
* The client automatically computes
* a checksum of the file. This checksum is verified against another checksum
* that is calculated once the data reaches Amazon S3, ensuring the data
* has not corrupted in transit over the network.
* </p>
* <p>
* Using the file extension, Amazon S3 attempts to determine
* the correct content type and content disposition to use
* for the object.
* </p>
* <p>
* Content length <b>must</b> be specified before data can be uploaded to
* Amazon S3. If the caller doesn't provide it, the library will make a best
* effort to compute the content length by buffer the contents of the input
* stream into the memory because Amazon S3 explicitly requires that the
* content length be sent in the request headers before any of the data is
* sent. Please note that this operation is not guaranteed to succeed.
* </p>
* <p>
* When using an {@link java.io.BufferedInputStream} as data source,
* please remember to use a buffer of size no less than
* {@link com.amazonaws.RequestClientOptions#DEFAULT_STREAM_BUFFER_SIZE}
* while initializing the BufferedInputStream.
* This is to ensure that the SDK can correctly mark and reset the stream with
* enough memory buffer during signing and retries.
* </p>
* <p>
* If versioning is enabled for the specified bucket, this operation will
* never overwrite an existing object at the same key, but instead will keep
* the existing object around as an older version until that version is
* explicitly deleted (see
* {@link AmazonS3#deleteVersion(String, String, String)}.
* </p>
* <p>
* If versioning is not enabled,
* this operation will overwrite an existing object
* with the same key; Amazon S3 will store the last write request.
* Amazon S3 does not provide object locking.
* If Amazon S3 receives multiple write requests for the same object nearly
* simultaneously, all of the objects might be stored. However, a single
* object will be stored with the final write request.
* </p>
* <p>
* When specifying a location constraint when creating a bucket, all objects
* added to the bucket are stored in the bucket's region. For example, if
* specifying a Europe (EU) region constraint for a bucket, all of that
* bucket's objects are stored in EU region.
* </p>
* <p>
* The specified bucket must already exist and the caller must have
* {@link Permission#Write} permission to the bucket to upload an object.
* </p>
*
* @param bucketName
* The name of an existing bucket, to which you have
* {@link Permission#Write} permission.
* @param key
* The key under which to store the specified file.
* @param input
* The input stream containing the data to be uploaded to Amazon
* S3.
* @param metadata
* Additional metadata instructing Amazon S3 how to handle the
* uploaded data (e.g. custom user metadata, hooks for specifying
* content type, etc.).
*
* @return A {@link PutObjectResult} object containing the information
* returned by Amazon S3 for the newly created object.
*
* @throws SdkClientException
* If any errors are encountered in the client while making the
* request or handling the response.
* @throws AmazonServiceException
* If any errors occurred in Amazon S3 while processing the
* request.
*
* @see AmazonS3#putObject(String, String, File)
* @see AmazonS3#putObject(PutObjectRequest)
* @see <a href="http://docs.aws.amazon.com/goto/WebAPI/s3-2006-03-01/PutObject">AWS API Documentation</a>
*/
public PutObjectResult putObject(
String bucketName, String key, InputStream input, ObjectMetadata metadata)
throws SdkClientException, AmazonServiceException;
从上述方法定义中,为完成上传文件同时添加属性的需求,可以借助ObjectMetadata承载具体的属性。
从ObjectMetadata的类中,可以看到添加的属性是通过Map存储的,如下。
/**
* Custom user metadata, represented in responses with the x-amz-meta-
* header prefix
*/
private Map<String, String> userMetadata = new TreeMap<String, String>(String.CASE_INSENSITIVE_ORDER);
/**
* All other (non user custom) headers such as Content-Length, Content-Type,
* etc.
*/
private Map<String, Object> metadata = new TreeMap<String, Object>(String.CASE_INSENSITIVE
综合以上,我们有了如下的抽象代码实现:(注:此代码将具体的实际业务字段用通用字段替代,仅作为展示使用)
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.addUserMetadata("key_filed", "value");
objectMetadata.addUserMetadata("content_type", "business");
amazonS3.putObject("bucket", "file.bin", new FileInputStream(new File("\data\s3\file.bin")), objectMetadata);
在程序实际运行时,发现putObject会有错误出现。
分析思路
最初看到异常后,认为是S3的相关配置有问题,然后重新review了一遍S3的配置,但是并未发现有什么问题。因此,这个方向被pass掉了。
怀疑是不是addUserMetadata方法使用错误,发现ObjectMetadata提供了setHeader方法,于是尝试用setHeader传递属性。是可以正常上传文件的。
至此,貌似问题解决了,但是总觉得是有不合理的地方。可以源码看到setHeader是向名为metadata的map添加元素,metadata被定义为All other (non user custom) headers, 即非用户添加的属性,我们将自定义属性添加到metadata并不合理。
为了解决这个问题,重新看了addUserMetadata方法的注释,具体方法定义如下:
/**
* <p>
* Adds the key value pair of custom user-metadata for the associated
* object. If the entry in the custom user-metadata map already contains the
* specified key, it will be replaced with these new contents.
* </p>
* <p>
* Amazon S3 can store additional metadata on objects by internally
* representing it as HTTP headers prefixed with "x-amz-meta-".
* Use user-metadata to store arbitrary metadata alongside their data in
* Amazon S3. When setting user metadata, callers <i>should not</i> include
* the internal "x-amz-meta-" prefix; this library will handle that for
* them. Likewise, when callers retrieve custom user-metadata, they will not
* see the "x-amz-meta-" header prefix.
* </p>
* <p>
* Note that user-metadata for an object is limited by the HTTP request
* header limit. All HTTP headers included in a request (including user
* metadata headers and other standard HTTP headers) must be less than 8KB.
* </p>
*
* @param key
* The key for the custom user metadata entry. Note that the key
* should not include
* the internal S3 HTTP header prefix.
* @param value
* The value for the custom user-metadata entry.
*
* @see ObjectMetadata#setUserMetadata(Map)
* @see ObjectMetadata#getUserMetadata()
*/
public void addUserMetadata(String key, String value) {
this.userMetadata.put(key, value);
}
从addUserMetadata的方法注释看出来,S3会将添加的属性赋值在HTTP header上,并且前缀是x-amz-meta-。通过以上可以推测出,S3中putObject实际上也是通过HTTP协议将文件以字节流的形式传输到S3。
然后重新审视我们的代码,在为addUserMetadata赋值时,key的命名为key_filed。看到这个命名总觉得不规范,因为能够看到的http header都是通过短横线-分割的,我们使用下划线_很奇怪。
为了让代码符合规范,我们尝试将key_filed重新命名为key-field。神奇的事情发生了,文件能够正常上传了。
解决方案
在上文中我们误打误撞的将一个问题解决了,但是并不知道所以然,所以就去查看了HTTP的规范
HTTP 消息头允许客户端和服务器通过 request和 response传递附加信息。一个请求头由名称(不区分大小写)后跟一个冒号 (
:),冒号后跟具体的值(不带换行符)组成。该值前面的引导空白会被忽略。
自定专用消息头可通过'X-' 前缀来添加;但是这种用法被 IETF 在 2012 年 6 月发布的 RFC6648 中明确弃用,原因是其会在非标准字段成为标准时造成不便;其他的消息头在 IANA 注册表 中列出,其原始内容在 RFC 4229 中定义。 此外,IANA 还维护着被提议的新 HTTP 消息头注册表.
在 HTTP 规范 RFC 2616 4.2 节中:
Request (section 5) and Response (section 6) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message).
在 RFC 822 3.1.2 节中,对于消息格式的说明:
The field-name must be composed of printable ASCII characters(i.e., characters that have values between 33. and 126., decimal, except colon).
根因:
在 HEADER字段名中使用下划线其实是合法的、符合 HTTP 标准的。服务器之所以要默认禁止使用是因为 CGI 历史遗留问题。下划线和中划线都为会被映射为 CGI 系统变量名中的下划线,这样容易引起混淆。
此外在很多文章也提到,如果使用到了Nginx作为代理服务器,可以修改显式的配置underscores_in_headers为on,以使得Nginx支持下划线。
至此,我们具体解决方案基本可以明确:仍然按照 amazonS3 规范使用 ObjectMetadata 的 addUserMetadata 添加属性,将key中的下划线 _ 调整为短横线 -。
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.addUserMetadata("key-filed", "value");
objectMetadata.addUserMetadata("content-type", "business");
amazonS3.putObject("bucket", "file.bin", new FileInputStream(new File("\data\s3\file.bin")), objectMetadata);
总结
通过这个问题的解决,可以总结下一些简单的解决问题的思路。
- 避免在HTTP header属性赋值时使用下划线
_。 - 充分阅读方法定义中注释的内容,有助于我们对这个方法的理解。
- 时刻遵守一些约定俗称的规则,如本文中的http header的属性命名规范。如果以Java为例的话,Java的驼峰命名也是需要遵守的,因为很多框架在自动化解析的时候都会以大小写为单词分割,进行属性的转换。如果不遵守规范,可能会有很多未知的问题出现。
- 对问题保持刨根问题的态度,不要因为搞定了就放过一个问题。