Python Pandas将所有文本列显示为 NaN 问题与解决方案

44 阅读1分钟

在使用 Python Pandas 库读取以制表符分隔的文本文件时,发现所有文本列都被显示为 NaN。输入文件示例如下:

Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Period: Oct 28 2013 - Apr 27 2014
Note:
Brand Variant                               Industry                                    Major Category                              Market                                      Media Type                                  Parent Company                              Product Category                            Report Period (multiple)                    PCC Sub Group                               Subsidiary                                  Units   $$$ (000)
3 LADIES HAND-DIPPED CANDIES CANDY  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Columbus Combo  Local Newspaper     COTTAGE FOOD PRODUCTION OPERATION   CANDY   11/18/13 - 11/24/13     F211 CANDY & GUM    COTTAGE FOOD PRODUCTION OPERATION   1   0.286   
3 MUSKETEERS CANDY BAR  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Atlanta Combo   Spot Radio  MARS INC    CANDY BAR   11/04/13 - 11/10/13     F211 CANDY & GUM    MARS SNACKFOOD US LLC   22  1.403   

使用以下 Python 代码读取文件:

import pandas as pd

df = pd.read_csv(csvFile, delimiter='\t', header=[9])
print(df)

输出结果如下:

Brand Variant                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Industry                                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Major Category                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Market                                    \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Media Type                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Parent Company                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Product Category                          \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Report Period (multiple)                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    PCC Sub Group                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Subsidiary                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Units $$$ (000)  
3 LADIES HAND-DIPPED CANDIES CANDY    NaN       NaN  
3 MUSKETEERS CANDY BAR                NaN       NaN  

可以看到,所有文本列都被显示为 NaN。

  1. 解决方案
import pandas as pd

df = pd.read_csv(csvFile, sep='\t', skiprows=9, index_col=False)
print(df)

通过设置 skiprows=9 跳过文件中的前 9 行(即 "Blah Blah" 行),并设置 index_col=False 来关闭自动将第一列设置为索引,就可以正确读取文本列。

输出结果如下:

   Brand Variant                               Industry                                    Major Category                              Market                                      Media Type                                  Parent Company                              Product Category                            Report Period (multiple)                    PCC Sub Group                               Subsidiary                                  Units   $$$ (000)
0  3 LADIES HAND-DIPPED CANDIES CANDY  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Columbus Combo  Local Newspaper     COTTAGE FOOD PRODUCTION OPERATION   CANDY   11/18/13 - 11/24/13     F211 CANDY & GUM    COTTAGE FOOD PRODUCTION OPERATION   1   0.286   
1  3 MUSKETEERS CANDY BAR  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Atlanta Combo   Spot Radio  MARS INC    CANDY BAR   11/04/13 - 11/10/13     F211 CANDY & GUM    MARS SNACKFOOD US LLC   22  1.403   

可以看到,文本列现在正确显示了。