AmazonS3执行putObject失败引起的思考

1,041 阅读7分钟

前言

本文通过剖析实际使用AmazonS3过程中遇到的一个问题,介绍下HTTP header的一些规则,以及一些解决为的思路。

问题

需求

需求本身十分简单,就是将BLOB文件上传到S3,同时需要将文件的诸多业务属性一同添加的S3Object上,便于下载后直接解析。

实现

使用AmazonS3的putObject方法完成文件上传,putObject有四个重载的方法,如下:

public PutObjectResult putObject(PutObjectRequest putObjectRequest)
        throws SdkClientException, AmazonServiceException;

public PutObjectResult putObject(String bucketName, String key, File file)
        throws SdkClientException, AmazonServiceException;

public PutObjectResult putObject(String bucketName, String key, InputStream input, ObjectMetadata metadata)
        throws SdkClientException, AmazonServiceException;

public PutObjectResult putObject(String bucketName, String key, String content)
        throws AmazonServiceException, SdkClientException;

我们使用的是上述的第三个方法,完整定义如下:

/**
 * <p>
 * Uploads the specified input stream and object metadata to Amazon S3 under
 * the specified bucket and key name.
 * </p>
 * <p>
 * Amazon S3 never stores partial objects;
 * if during this call an exception wasn't thrown,
 * the entire object was stored.
 * </p>
 * <p>
 * If you are uploading or accessing <a
 * href="http://aws.amazon.com/kms/">AWS KMS</a>-encrypted objects, you need to
 * specify the correct region of the bucket on your client and configure AWS
 * Signature Version 4 for added security. For more information on how to do
 * this, see
 * http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html#
 * specify-signature-version
 * </p>
 * <p>
 * The client automatically computes
 * a checksum of the file. This checksum is verified against another checksum
 * that is calculated once the data reaches Amazon S3, ensuring the data
 * has not corrupted in transit over the network.
 * </p>
 * <p>
 * Using the file extension, Amazon S3 attempts to determine
 * the correct content type and content disposition to use
 * for the object.
 * </p>
 * <p>
 * Content length <b>must</b> be specified before data can be uploaded to
 * Amazon S3. If the caller doesn't provide it, the library will make a best
 * effort to compute the content length by buffer the contents of the input
 * stream into the memory because Amazon S3 explicitly requires that the
 * content length be sent in the request headers before any of the data is
 * sent. Please note that this operation is not guaranteed to succeed.
 * </p>
 * <p>
 * When using an {@link java.io.BufferedInputStream} as data source,
 * please remember to use a buffer of size no less than
 * {@link com.amazonaws.RequestClientOptions#DEFAULT_STREAM_BUFFER_SIZE}
 * while initializing the BufferedInputStream.
 * This is to ensure that the SDK can correctly mark and reset the stream with
 * enough memory buffer during signing and retries.
 * </p>
 * <p>
 * If versioning is enabled for the specified bucket, this operation will
 * never overwrite an existing object at the same key, but instead will keep
 * the existing object around as an older version until that version is
 * explicitly deleted (see
 * {@link AmazonS3#deleteVersion(String, String, String)}.
 * </p>

 * <p>
 * If versioning is not enabled,
 * this operation will overwrite an existing object
 * with the same key; Amazon S3 will store the last write request.
 * Amazon S3 does not provide object locking.
 * If Amazon S3 receives multiple write requests for the same object nearly
 * simultaneously, all of the objects might be stored.  However, a single
 * object will be stored with the final write request.
 * </p>

 * <p>
 * When specifying a location constraint when creating a bucket, all objects
 * added to the bucket are stored in the bucket's region. For example, if
 * specifying a Europe (EU) region constraint for a bucket, all of that
 * bucket's objects are stored in EU region.
 * </p>
 * <p>
 * The specified bucket must already exist and the caller must have
 * {@link Permission#Write} permission to the bucket to upload an object.
 * </p>
 *
 * @param bucketName
 *            The name of an existing bucket, to which you have
 *            {@link Permission#Write} permission.
 * @param key
 *            The key under which to store the specified file.
 * @param input
 *            The input stream containing the data to be uploaded to Amazon
 *            S3.
 * @param metadata
 *            Additional metadata instructing Amazon S3 how to handle the
 *            uploaded data (e.g. custom user metadata, hooks for specifying
 *            content type, etc.).
 *
 * @return A {@link PutObjectResult} object containing the information
 *         returned by Amazon S3 for the newly created object.
 *
 * @throws SdkClientException
 *             If any errors are encountered in the client while making the
 *             request or handling the response.
 * @throws AmazonServiceException
 *             If any errors occurred in Amazon S3 while processing the
 *             request.
 *
 * @see AmazonS3#putObject(String, String, File)
 * @see AmazonS3#putObject(PutObjectRequest)
 * @see <a href="http://docs.aws.amazon.com/goto/WebAPI/s3-2006-03-01/PutObject">AWS API Documentation</a>
 */
public PutObjectResult putObject(
        String bucketName, String key, InputStream input, ObjectMetadata metadata)
        throws SdkClientException, AmazonServiceException;

从上述方法定义中,为完成上传文件同时添加属性的需求,可以借助ObjectMetadata承载具体的属性。

ObjectMetadata的类中,可以看到添加的属性是通过Map存储的,如下。

/**
 * Custom user metadata, represented in responses with the x-amz-meta-
 * header prefix
 */
private Map<String, String> userMetadata = new TreeMap<String, String>(String.CASE_INSENSITIVE_ORDER);

/**
 * All other (non user custom) headers such as Content-Length, Content-Type,
 * etc.
 */
private Map<String, Object> metadata = new TreeMap<String, Object>(String.CASE_INSENSITIVE

综合以上,我们有了如下的抽象代码实现:(注:此代码将具体的实际业务字段用通用字段替代,仅作为展示使用)

ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.addUserMetadata("key_filed", "value");
objectMetadata.addUserMetadata("content_type", "business");

amazonS3.putObject("bucket", "file.bin", new FileInputStream(new File("\data\s3\file.bin")), objectMetadata);

在程序实际运行时,发现putObject会有错误出现。

分析思路

最初看到异常后,认为是S3的相关配置有问题,然后重新review了一遍S3的配置,但是并未发现有什么问题。因此,这个方向被pass掉了。

怀疑是不是addUserMetadata方法使用错误,发现ObjectMetadata提供了setHeader方法,于是尝试用setHeader传递属性。是可以正常上传文件的。

至此,貌似问题解决了,但是总觉得是有不合理的地方。可以源码看到setHeader是向名为metadata的map添加元素,metadata被定义为All other (non user custom) headers, 即非用户添加的属性,我们将自定义属性添加到metadata并不合理。

为了解决这个问题,重新看了addUserMetadata方法的注释,具体方法定义如下:

/**
 * <p>
 * Adds the key value pair of custom user-metadata for the associated
 * object. If the entry in the custom user-metadata map already contains the
 * specified key, it will be replaced with these new contents.
 * </p>
 * <p>
 * Amazon S3 can store additional metadata on objects by internally
 * representing it as HTTP headers prefixed with "x-amz-meta-".
 * Use user-metadata to store arbitrary metadata alongside their data in
 * Amazon S3. When setting user metadata, callers <i>should not</i> include
 * the internal "x-amz-meta-" prefix; this library will handle that for
 * them. Likewise, when callers retrieve custom user-metadata, they will not
 * see the "x-amz-meta-" header prefix.
 * </p>
 * <p>
 * Note that user-metadata for an object is limited by the HTTP request
 * header limit. All HTTP headers included in a request (including user
 * metadata headers and other standard HTTP headers) must be less than 8KB.
 * </p>
 *
 * @param key
 *            The key for the custom user metadata entry. Note that the key
 *            should not include
 *            the internal S3 HTTP header prefix.
 * @param value
 *            The value for the custom user-metadata entry.
 *
 * @see ObjectMetadata#setUserMetadata(Map)
 * @see ObjectMetadata#getUserMetadata()
 */
public void addUserMetadata(String key, String value) {
    this.userMetadata.put(key, value);
}

addUserMetadata的方法注释看出来,S3会将添加的属性赋值在HTTP header上,并且前缀是x-amz-meta-。通过以上可以推测出,S3中putObject实际上也是通过HTTP协议将文件以字节流的形式传输到S3。

然后重新审视我们的代码,在为addUserMetadata赋值时,key的命名为key_filed。看到这个命名总觉得不规范,因为能够看到的http header都是通过短横线-分割的,我们使用下划线_很奇怪。

为了让代码符合规范,我们尝试将key_filed重新命名为key-field。神奇的事情发生了,文件能够正常上传了。

解决方案

在上文中我们误打误撞的将一个问题解决了,但是并不知道所以然,所以就去查看了HTTP的规范

HTTP 消息头允许客户端和服务器通过 request和 response传递附加信息。一个请求头由名称(不区分大小写)后跟一个冒号 (:),冒号后跟具体的值(不带换行符)组成。该值前面的引导空白会被忽略。

自定专用消息头可通过'X-' 前缀来添加;但是这种用法被 IETF 在 2012 年 6 月发布的 RFC6648 中明确弃用,原因是其会在非标准字段成为标准时造成不便;其他的消息头在 IANA 注册表 中列出,其原始内容在 RFC 4229 中定义。 此外,IANA 还维护着被提议的新 HTTP 消息头注册表.

在 HTTP 规范 RFC 2616 4.2 节中:

Request (section 5) and Response (section 6) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message).

在 RFC 822 3.1.2 节中,对于消息格式的说明:

The field-name must be composed of printable ASCII characters(i.e., characters that have values between 33. and 126., decimal, except colon).

根因:

在 HEADER字段名中使用下划线其实是合法的、符合 HTTP 标准的。服务器之所以要默认禁止使用是因为 CGI 历史遗留问题。下划线和中划线都为会被映射为 CGI 系统变量名中的下划线,这样容易引起混淆。

此外在很多文章也提到,如果使用到了Nginx作为代理服务器,可以修改显式的配置underscores_in_headerson,以使得Nginx支持下划线。

至此,我们具体解决方案基本可以明确:仍然按照 amazonS3 规范使用 ObjectMetadataaddUserMetadata 添加属性,将key中的下划线 _ 调整为短横线 -

ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.addUserMetadata("key-filed", "value");
objectMetadata.addUserMetadata("content-type", "business");

amazonS3.putObject("bucket", "file.bin", new FileInputStream(new File("\data\s3\file.bin")), objectMetadata);

总结

通过这个问题的解决,可以总结下一些简单的解决问题的思路。

  1. 避免在HTTP header属性赋值时使用下划线 _
  2. 充分阅读方法定义中注释的内容,有助于我们对这个方法的理解。
  3. 时刻遵守一些约定俗称的规则,如本文中的http header的属性命名规范。如果以Java为例的话,Java的驼峰命名也是需要遵守的,因为很多框架在自动化解析的时候都会以大小写为单词分割,进行属性的转换。如果不遵守规范,可能会有很多未知的问题出现。
  4. 对问题保持刨根问题的态度,不要因为搞定了就放过一个问题。

参考文献