使用 Gemini 及 Python 更新 Android 多语言 Excel 文件使用 Gemini 结合 Pytho

1. 概述

使用 Gemini 结合 Python 脚本从多份表中提取所需数据更新到对应表中。

2. 准备

将所有数据汇总到的 data.xlsx 文件，其中 Sheet0 是待处理的表，其余表为更新数据源。

Sheet0、Sheet1 中标红位置为待更新错误数据。

Sheet2 中包含剩余正确数据。

3. 向 Gemini 提问

Gemini

提问内容：

使用 Python 找出除了 Sheet0 之外其他所有 Sheet 中 zh 列中完全相同的字符串的项，最新相同项所在行的的其他列项数据与 Sheet0 中对应列名列项数据比对，不相同的项则直接覆盖 Sheet0 中对应项，更新过的项背景标记为黄色

得到的回答：

import openpyxl
from openpyxl.styles import PatternFill
import os


def sync_excel_data(file_path):
    file_dir, file_name = os.path.split(file_path)
    file_base, file_ext = os.path.splitext(file_name)
    # 保存更新后的 DataFrame 到新文件（或覆盖原文件）
    output_file_path = os.path.join(file_dir, f"{file_base}_updated{file_ext}")
    """
    同步 Excel 工作表数据到 Sheet0。
    
    参数:
    output_file_path (str): Excel 文件的路径。
    """
    try:
        workbook = openpyxl.load_workbook(file_path)
        sheet0 = workbook['Sheet0']
    except FileNotFoundError:
        print(f"错误：找不到文件 {file_path}")
        return
    except KeyError:
        print("错误: 找不到名为 'Sheet0' 的工作簿。")
        return

    # 定义黄色填充
    yellow_fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')

    # 获取 Sheet0 的列名和索引的映射
    sheet0_headers = {}
    if sheet0.max_row > 0:
        for col_index, cell in enumerate(sheet0[1], 1):
            if cell.value:
                sheet0_headers[cell.value] = col_index

    if 'zh' not in sheet0_headers:
        print("错误: Sheet0 中找不到名为 'zh' 的列。")
        return

    # 存储 Sheet0 中 "zh" 列的数据及其所在行号
    sheet0_zh_data = {}
    for row_index, row in enumerate(sheet0.iter_rows(min_row=2), 2):
        zh_value = row[sheet0_headers['zh'] - 1].value
        if zh_value is not None:
            # 存储 (值, 行号)
            sheet0_zh_data[zh_value] = row_index

    # 遍历所有工作表，除了 Sheet0
    sheet_names_to_process = [name for name in workbook.sheetnames if name.startswith('Sheet') and name != 'Sheet0']
    sheet_names_to_process.sort(key=lambda x: int(x[5:]) if x[5:].isdigit() else float('inf'))
    # 累计更新单元格个数
    updated_count = 0
    # 在 Sheet0 中查找并更新数据
    for sheet_name in sheet_names_to_process:
        print(f"正在处理工作表: {sheet_name}...")
        current_sheet = workbook[sheet_name]

        # 获取当前工作表的列名和索引的映射
        current_sheet_headers = {}
        if current_sheet.max_row > 0:
            for col_index, cell in enumerate(current_sheet[1], 1):
                if cell.value:
                    current_sheet_headers[cell.value] = col_index

        if 'zh' not in current_sheet_headers:
            print(f"警告: 在 {sheet_name} 中找不到名为 'zh' 的列，跳过。")
            continue

        # 遍历当前工作表的数据行
        for row_index, row in enumerate(current_sheet.iter_rows(min_row=2), 2):
            zh_value_current = row[current_sheet_headers['zh'] - 1].value

            if zh_value_current in sheet0_zh_data:
                row_to_update_index = sheet0_zh_data[zh_value_current]

                # 遍历当前工作表的所有列，用列名匹配来更新 Sheet0
                for col_name, current_col_index in current_sheet_headers.items():
                    # 确保在 Sheet0 中存在相同的列名，并且不是 "zh" 列
                    if col_name in sheet0_headers and col_name != 'zh':
                        # 获取当前工作表的数据
                        current_cell_value = current_sheet.cell(row=row_index, column=current_col_index).value

                        # 获取 Sheet0 中对应的单元格
                        sheet0_col_index = sheet0_headers[col_name]
                        cell_to_update = sheet0.cell(row=row_to_update_index, column=sheet0_col_index)
                        last_value = cell_to_update.value
                        # 如果值不同，则更新并标记
                        if cell_to_update.value != current_cell_value:
                            cell_to_update.value = current_cell_value
                            cell_to_update.fill = yellow_fill
                            updated_count += 1
                            print(f"更新单元格 {last_value} -> {current_cell_value}")
    # 保存更改
    try:
        workbook.save(output_file_path)
        print(f"数据更新完成，共更新了 {updated_count} 个单元格，新文件已保存为：{output_file_path} ")
    except Exception as e:
        print(f"保存文件时出错：{e}")


if __name__ == '__main__':
    input_path = 'data.xlsx'
    sync_excel_data(input_path)

优化提问：

优化上面的提问文案

得到的回答：

方案一：简洁明了型

使用 Python 找出并更新 Excel 文件中重复的数据。

具体需求：

在 Sheet0 之外的所有工作表中，找到与 Sheet0 中 "zh" 列完全相同的项。

将最新找到的相同项所在行的其他列数据，与 Sheet0 中对应的数据进行比对。

如果数据不一致，则用最新的数据覆盖 Sheet0 中的旧数据。

将所有更新过的单元格背景标记为黄色。

方案二：项目需求型

项目目标：Python 脚本自动同步 Excel 工作表数据。

需求详情：

数据比对：遍历 Excel 文件中除 Sheet0 以外的所有工作表。

匹配条件：以 "zh" 列作为唯一标识，找到与 Sheet0 中完全匹配的行。

内容同步：如果匹配行的其他列数据与 Sheet0 不符，则覆盖 Sheet0 中的数据。

可视化标记：自动将所有被更新的单元格填充为黄色，以便识别。

方案三：更口语化、直接型

帮我用 Python 写个脚本，来整理我的 Excel 表。

我想让它做三件事：

先看 Sheet0，然后去检查其他所有表格。

只要在其他表格里找到 "zh" 列和 Sheet0 里一模一样的项，就把那一行的数据拿过来。

对比一下，如果其他列的数据不一样，就用新数据把 Sheet0 里的旧数据换掉，并且把改过的地方涂成黄色。

4. 使用 Python 执行脚本

得到输出文件 data_updated.xlsx

5.可能遇到的问题

5.1. 出现多余 name 字符

多余字符.png

5.2. 空格符号转义问题

空格占位符.png

5.3. 出现多余<![CDATA[ 字符

多余符号.png

参考资料

Gemini
Python