Python 脚本中降低 RAM 使用率我编写了一个小型程序从联合国教科文组织网站提取图书数据，该网站包含有关图书翻译的

我编写了一个小型程序从联合国教科文组织网站提取图书数据，该网站包含有关图书翻译的信息。代码按照我的预期工作，但是当它处理大约 20 个国家时，却使用了大约 6GB 的 RAM。由于我需要处理大约 200 个国家，因此这对于我来说是不可行的。

我不确定所有的 RAM 使用情况来自哪里，所以我不知道如何减少它。我假设是保存所有图书信息的字典造成的，但我不确定。我不确定我是否应该让程序为每个国家运行一次，而不是一次性处理所有国家？或者有更好的方法来做到这一点？

这是第一次编写这样的程序，作为一个初学者，自学成才的程序员，请指出代码中的任何重大缺陷或改进建议，这些建议可能与手头的具体问题无关。

2、解决方案

1）优化代码

使用适当的命名规范，比如对于 Python 来说，使用蛇形命名来命名变量和函数。
使用 BeautifulSoup 提供的 SoupStrainer 来指定要解析的 HTML 内容，减少解析开销。
在使用完 BeautifulSoup 对象后及时调用其 decompose() 方法释放内存。
将 Book 类中将数据写入磁盘的操作改为在用户调用 export() 方法时才进行。
将数据存储在生成器中，而不是在内存中保存数据，然后一次性写入磁盘，该方式可以减少内存占用。
使用 os.path 模块来处理路径，这样即使在不同的平台上，代码也可以正常运行。
使用 BeautifulSoup 4.1.2 引入的 class_= 关键字，简化了查找元素的操作。

2）分批处理数据

将数据划分为较小的批次，然后一次仅处理一个批次，这样可以减少内存使用量。
使用多进程或多线程来同时处理多个批次，这样可以提高处理效率。

以下是代码示例，展示了如何使用生成器和分批处理来减少内存使用量：

import os
from bs4 import BeautifulSoup, SoupStrainer
from urllib2 import urlopen

base_url = "http://www.unesco.org/xtrans/bsresult.aspx?lg=0&c="
destination_directory = "/Users/robbie/Test/"
only_restable = SoupStrainer(class_="restable")

def build_list(country_code_list, countries):
    for country in country_code_list:
        print("Processing %s now..." % countries[country])
        results_total = get_all_pages(country, base_url)

        batch_size = 10  # Batch size
        for start in range(0, results_total, batch_size):
            all_books = []
            target_page = urlopen(base_url + country + "&fr=" + str(start))
            page = BeautifulSoup(target_page, parse_only=only_restable)
            books = page.find_all('td', class_="res2")
            for book in books:
                all_books.append(Book(book, country))
            page.decompose()

            for title in all_books:
                title.export(country)

def get_all_pages(country, base_url):
    base_page = urlopen(base_url + country)
    page = BeautifulSoup(base_page, parse_only=only_restable)

    result_number = page.find_all('td', class_="res1", limit=1)
    if not result_number:
        return 0

    str_result_number = str(result_number[0].getText())
    results_total = int(str_result_number.split('/')[1])

    page.decompose()

    return results_total

class Book(object):
    def export(self, country):
        file_name = os.path.join(destination_directory + country + ".csv")

        with open(file_name, "a") as by_country_csv:
            print(self.author.encode('UTF-8') + " & " + \
                  self.quality.encode('UTF-8') + " & " + \
                  self.target_title.encode('UTF-8') + " & " + \
                  self.target_language.encode('UTF-8') + " & " + \
                  self.translators.encode('UTF-8') + " & " + \
                  self.published_city.encode('UTF-8') + " & " + \
                  self.publisher.encode('UTF-8') + " & " +

                  self.published_country.encode('UTF-8') + " & " + \
                  self.year.encode('UTF-8') + " & " + \
                  self.pages.encode('UTF-8') + " & " + \
                  self.edition.encode('UTF-8') + " & " + \
                  self.original_title.encode('UTF-8') + " & " + \
                  self.original_languages.encode('UTF-8'), file=by_country_csv)
        by_country_csv.close()

    def __init__(self, book, country):
        self.set_author(book)
        self.set_quality(book)
        self.set_target_title(book)
        self.set_target_language(book)

        self.set_translator_name(book)
        self.set_published_city(book)
        self.set_publisher(book)
        self.set_published_country(book)

        self.set_year(book)
        self.set_pages(book)
        self.set_edition(book)
        self.set_original_title(book)

        self.set_original_language(book)

    # 省略其他方法的实现


if __name__ == "__main__":
    country_code_list = ["AFG", "ALA", "DZA"]
    countries = {"AFG": "Afghanistan", "ALA": "Aland Islands", "DZA": "Algeria"}
    build_list(country_code_list, countries)
    print("Completed.")