Pandas读取文件获取DataFrame

56 阅读1分钟
  1. 读取csv文件
name,address,phonenum
张三,南京市秦淮区,13978632453
李四,上海市徐汇区,12382738453
import pandas as pd

if __name__ == '__main__':
    file_path = "dataset/data1.csv"
    dataset = pd.read_csv(file_path)
    print(dataset)
    print(type(dataset))
    # 查看前几行数据
    dataset.head()
    # 查看二维矩阵的形状
    print(dataset.shape)
    # 查看二维矩阵的列属性
    print(dataset.columns)
    # 行索引属性
    print(dataset.index)
    # 每列的数据类型
    print(dataset.dtypes)
  name address     phonenum
0   张三  南京市秦淮区  13978632453
1   李四  上海市徐汇区  12382738453
<class 'pandas.core.frame.DataFrame'>
(2, 3)
Index(['name', 'address', 'phonenum'], dtype='object')
RangeIndex(start=0, stop=2, step=1)
name        object
address     object
phonenum     int64
dtype: object
  1. 读取txt文件
张三 南京市秦淮区 13978632453
李四 上海市徐汇区 12382738453
import pandas as pd

if __name__ == '__main__':
    file_path = "dataset/data2.txt"
    dataset = pd.read_csv(file_path,
                          sep=" ",
                          header=None,
                          names=["name", "addr", "pnum"])
    print(dataset)
  name    addr         pnum
0   张三  南京市秦淮区  13978632453
1   李四  上海市徐汇区  12382738453
  1. 读取excel文件
import pandas as pd

if __name__ == '__main__':
    file_path = "dataset/data4.xlsx"
    dataset = pd.read_excel(file_path, engine='openpyxl')
    print(dataset)
  1. 读取mysql数据库表
import pandas as pd
import pymysql

if __name__ == '__main__':
    conn = pymysql.connect(
        host="localhost",
        user="root",
        password="",
        database="db_go_test"
    )
    dataset = pd.read_sql("select * from users", con=conn)
    print(dataset)

上诉代码会有如下警告:

UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. dataset = pd.read_sql("select * from users", con=conn)

使用如下代码可消除警告信息:

import pandas as pd
from sqlalchemy import create_engine

if __name__ == '__main__':
    # engine = create_engine('sqlite:///your_database.db')
    engine = create_engine('mysql+pymysql://root:@localhost/db_go_test')
    dataset = pd.read_sql("select * from users", con=engine)
    print(dataset.head())