使用 MySQL LOAD DATA INFILE 导入 CSV 时移除公式值

115 阅读2分钟

当使用 LOAD DATA INFILE 命令将 CSV 文件直接导入 MySQL 时,可能会遇到一些字段残留了 Excel 公式字段,例如值之前带有一个等号 (=)。当目标表结构设置允许第一个字段设置为 VARCHAR(100) 时,理想情况下希望将其设置为 INT。以下是一个 CSV 内容示例,其中第一个字段包含等号:

huake_00198_.jpg "MID","DBA Name","Partner ID","Partner Name","Sub Partner ID","Sub Partner Name","Active Months","Bonus Amount","Bonus Applied Date","Partner Percentage","Partner Share","Total Payment" ="0008788014065741","company2","7968","me,"11839","Joe Blow","0","50.00","","","","50.00","","","","350.64" ="0008788014065756","company2","7968","you","11839","Joe Blow","0","50.00","","","","50.00","","","","294.60"

以下是要使用的 MySQL 导入命令:

sql = """ LOAD DATA LOCAL INFILE '%(upload)s' IGNORE INTO TABLE %(table)s FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ; """ % {"upload": file, "table": report} self.db.query( sql )

任务是弄清楚如何在导入时处理这些公式值,以使其在导入到 MySQL 表时被正确解析。

2、解决方案

有两种方法可以使用 LOAD DATA INFILE 解决此问题:

方法一:使用 SET 子句 这种方法涉及在 LOAD DATA INFILE 命令中使用 SET 子句。SET 子句允许在导入过程中对列值进行转换和修改。在 SET 子句中,我们可以使用 TRIM 函数删除公式值开头的等号 (=) 和双引号 (")。我们还可以对其他列进行必要的转换,例如删除货币符号、将日期转换为正确的格式等。

以下是一个使用 SET 子句的示例:

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n'
    IGNORE 1 LINES 
(@MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths,
 @BonusAmount, @BonusAppliedDate, @PartnerPercentage, @PartnerShare, @TotalPayment)
SET MID = TRIM(BOTH '"' FROM SUBSTR(@MID, 2)), -- here we get rid of equal sign and double quotes
    BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

方法二:使用 LINES STARTING BY 子句 这种方法涉及使用 LINES STARTING BY 子句。LINES STARTING BY 子句允许指定导入文件中数据的开始行。我们可以使用 LINES STARTING BY 子句指定公式值所在行的开始标志,例如等号 (=)。然后,我们可以跳过这些行,只导入不包含公式值的数据。

以下是一个使用 LINES STARTING BY 子句的示例:

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n' STARTING BY '='
    IGNORE 1 LINES 
(MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths, @BonusAmount, BonusAppliedDate, PartnerPercentage, PartnerShare, @TotalPayment)
SET BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

这两种方法都可以有效地从 CSV 文件中删除公式值,并将其正确导入到 MySQL 表中。