CSV Reader and DictReader 将数字字段转换为字符串当使用 csv.reader() 和 csv.

当使用 csv.reader() 和 csv.DictReader() 读取 CSV 文件时，数字字段通常会被转换为字符串。这可能是由于 CSV 文件本身没有指定字段类型，导致 Python 在读取时根据字段内容猜测类型。

解决方案

问题： CSV reader and DictReader turn numeric fields into strings The first row of the csv has the headers. Here is a sample row of my csv: 2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,KL0602130731,AIRFRANCE KLM,KLM,KLM,KLM,KL,KLM ROYAL DUTCH AIRLINES,,0602,,KL0602,KL,KLM ROYAL DUTCH AIRLINES,,,,KL,0602,,,LAX,AMS,,31-7-2013 0:00:00,2013-07-31,2013-07-31,2013-07-31,2013-07-31, 13:55:00,14:39:00,20:55:00,21:39:00,2013-08-01,2013-08-01,2013-08-01,2013-08-01, 09:05:00,09:45:00,07:05:00,07:45:00,2.0,,2,,,LAX,LOS ANGELES INTERNATIONAL AIRPORT, LAX,LAX,5.0,LAX,LOS ANGELES,US,UNITED STATES OF AMERICA,US,USA,NA8,NORTHERN AMERICA, AMERICAS,,,,AMS,SCHIPHOL I,F,OFFLINE,I,INDIRECT OFFLINE,14.0,3.0,FRONT,Business,2.0,nan, PLANNED,3.0,,2.0,2.0,34.0,4.0,400254887nan,1.0,2.0,2.0,2.0,1.0,2.0,6.0,3.0,1.0,3.0,1.0,1.0, nan,nan,nan,nan,nan,nan,nan,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan, nan,2.0,2.0,2.0,2.0,2.0,7.0,nan,2.0,3.0,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan, nan,nan,nan,nan,6.0,1.0,nan,nan,nan,nan,nan,2.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,2.0,2.0, nan,2.0,nan,3.0,nan,,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,13.7885862654653, 0.2, 34273499844164,nan,37.0,Booked,35.0,10.0,2.0,2.0,6.0,35.0,10.0,42.0,nan,nan,LAX,LAX,N

If I use either input_file = csv.DictReader(open("file.csv") or input_file = csv.reader(open('file.csv')), all my objects will turn into strings. A piece of a row printed in python: '2013-08-31 00:00:00', '', '1.0', '2013.0', '8.0', 'Q3','C', '03J', '', '', '', '', 'nan', 'nan', '', 'NON-AIRPORT', 'SELF-SERVICE', 'ICI', '', '19.0', '20130819', '1.0', '19.0', '9.0', '20130901', '2.0', '1.0', '1.0', '1.0', '10.0', '5.0', '5.0', '3.0', '4.0', '4.0', '2.0', '2.0', '', 'nan', '2.0', '', '24854524', 'nan', 'nan', 'nan', 'nan', '1.0', 'nan', '5.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', '4.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', '2.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', '3.0', '5.0', '5.0'

As you can see all dates, strings, floats and integers have been turned into strings. How can I correctly import them? Assuming that it we have 400 columns of data and I cannot define manually the type of each column.

答案1： You're looking at this backwards. It's not that they're being turned into strings, it's that they are strings, in the sense that CSV isn't a format that preserves type information. You didn't do anything to turn them into anything else, and Python isn't going to guess. Is Nan a float, or an affectionate name for one's grandmother? Is 3.0 a float, or the name of an avant-garde nerdcore blues band? If you can think of an algorithm to guess the types, then you can apply that, of course: import csv import ast import datetime

def guess_type(x): attempt_fns = [ast.literal_eval, float, lambda x: datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S") ] for fn in attempt_fns: try: return fn(x) except (ValueError, SyntaxError): pass return x

with open("untyped.csv", "rb") as fp: reader = csv.reader(fp) for row in reader: row = [guess_type(x) for x in row] print row print map(type, row)

With the file 2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,nan

the above code will produce [datetime.datetime(2013, 7, 31, 0, 0), '', 1.0, 2013.0, 7.0, 'Q3', 21160742, '32HHBS1307170203', nan] [<type 'datetime.datetime'>, <type 'str'>, <type 'float'>, <type 'float'>, <type 'float'>, <type 'str'>, <type 'int'>, <type 'str'>, <type 'float'>]

which isn't bad. PS: If you're going to be doing serious work with CSV files in Python, I strongly recommend checking out pandas-- you'll waste time reimplementing parts of its functionality otherwise.

答案2： They are not converted to strings, they already are strings to begin with. But you can try to convert them into floats after reading them: Assuming row contains a row of data, then you can do newrow = [] for item in row: try: newrow.append(float(item)) except ValueError: newrow.append(item)

下面给出了两种解决方法：

方法一：使用 ast.literal_eval() 函数