如何解决使用 Python 生成器和 OpenStack swift 客户端时遇到的问题在使用 Python 生成器和

在使用 Python 生成器和 OpenStack swift 客户端库时，我遇到了一些问题。我需要从一个特定的 URL 中获取一个大约 7MB 的大字符串，将字符串分割成更小的部分，并返回一个生成器类，其中每个迭代都包含一个分割的部分。在测试套件中，这是一个字符串，它被发送到一个用了补丁的 swift 客户端类进行处理。补丁后的类中的代码如下所示：

def monkeypatch_class(name, bases, namespace):
    '''Guido's monkeypatch metaclass.'''
    assert len(bases) == 1, "Exactly one base class required"
    base = bases[0]
    for name, value in namespace.iteritems():
        if name != "__metaclass__":
            setattr(base, name, value)
    return base

而在测试套件中：

from swiftclient import client
import StringIO
import utils

class Connection(client.Connection):
    __metaclass__ = monkeypatch_class

    def get_object(self, path, obj, resp_chunk_size=None, ...):
        contents = None
        headers = {}

        # retrieve content from path and store it in 'contents'
        ...

        if resp_chunk_size is not None:
            # stream the string into chunks
            def _object_body():
                stream = StringIO.StringIO(contents)
                buf = stream.read(resp_chunk_size)
                while buf:
                    yield buf
                    buf = stream.read(resp_chunk_size)
            contents = _object_body()
        return headers, contents

在返回生成器对象后，它将被存储类中的一个流函数调用：

class SwiftStorage(Storage):

    def get_content(self, path, chunk_size=None):
        path = self._init_path(path)
        try:
            _, obj = self._connection.get_object(
                self._container,
                path,
                resp_chunk_size=chunk_size)
            return obj
        except Exception:
            raise IOError("Could not get content: {}".format(path))

    def stream_read(self, path):
        try:
            return self.get_content(path, chunk_size=self.buffer_size)
        except Exception:
            raise OSError(
                "Could not read content from stream: {}".format(path))

最后，在我的测试套件中：

def test_stream(self):
    filename = self.gen_random_string()
    # test 7MB
    content = self.gen_random_string(7 * 1024 * 1024)
    self._storage.stream_write(filename, io)
    io.close()
    # test read / write
    data = ''
    for buf in self._storage.stream_read(filename):
        data += buf
    self.assertEqual(content,
                     data,
                     "stream read failed. output: {}".format(data))

输出结果是：

======================================================================
FAIL: test_stream (test_swift_storage.TestSwiftStorage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bacongobbler/git/github.com/bacongobbler/docker-registry/test/test_local_storage.py", line 46, in test_stream
    "stream read failed. output: {}".format(data))
AssertionError: stream read failed. output: <generator object _object_body at 0x2a6bd20>

我尝试用一个遵循与上述代码相同流程的简单 Python 脚本来分离这个问题，该脚本通过了测试，没有出现问题：

def gen_num():
    def _object_body():
        for i in range(10000000):
            yield i
    return _object_body()

def get_num():
    return gen_num()

def stream_read():
    return get_num()

def main():
    num = 0
    for i in stream_read():
        num += i
    print num

if __name__ == '__main__':
    main()

对于这个问题，我需要帮助。

2、解决方案

在 get_object 方法中，你将 _object_body() 的返回值赋给了 contents 变量。然而，该变量同时用于存储你的实际数据，并且在 _object_body 的早期被使用。

问题在于 _object_body 是一个生成器函数（它使用 yield）。因此，当你调用它时，它会生成一个生成器对象，但该函数的代码不会运行，直到你迭代该生成器。这意味着当该函数的代码实际开始运行时（_test_stream 中的 for 循环），你早已重新给 contents 赋值为 _object_body()。

因此，你的 stream = StringIO(contents) 创建了一个包含生成器对象（因此导致你的错误消息）的 StringIO 对象，而不是数据。

以下是一个演示该问题的最小重现案例：

def foo():
    contents = "Hello!"

    def bar():
        print contents
        yield 1

    # 只创建生成器。这行代码不会运行 bar 中的任何代码。
    contents = bar()

    print "About to start running..."
    for i in contents:
        # 现在我们运行 bar 中的代码，但 contents 现在绑定到生成器对象。因此它不会打印 "Hello!"
        pass