同花顺Supermind量化交易 Python基础编程--pandas基础

131 阅读20分钟

pandas 是基于 Numpy 构建的,让以 Numpy 为中心的应用变得更加简单。pandas是公认的数据处理利器,本章内容主要介绍DataFrame数据结构,在此基础上进行数据处理。除了DataFrame格式,pandas 还包括series、Panel。

第六节:pandas基础

  pandas 是基于 Numpy 构建的,让以 Numpy 为中心的应用变得更加简单。pandas是公认的数据处理利器,本章内容主要介绍DataFrame数据结构,在此基础上进行数据处理。除了DataFrame格式,pandas 还包括series、Panel。

格式数组释义
Series一维数组与Numpy中的一维array类似。
DataFrame二维的表格型数据结构可以将DataFrame理解为Series的容器
Panel三维的数组可以理解为DataFrame的容器

  开始之前,我们首先掌握导入pandas库,方式如下:

In [ ]:

import pandas as pd

  注意:以下内容必须在导入pandas库之后才能运行。

一、Series和DataFrame介绍

   1.Series
由一组数据和与之相关的索引组成。可通过传递一个list对象来创建一个Series,pandas会默认创建整型索引。
创建一个Series:

In [2]:

s = pd.Series([1,3,5,7,6,8])
print(s)
0    1
1    3
2    5
3    7
4    6
5    8
dtype: int64

  获取 Series 的索引:

In [3]:

s.index

Out[3]:

RangeIndex(start=0, stop=6, step=1)

  2.DataFrame
DataFrame是一个表格型的数据结构,它含有一组有序的列,每一列的数据结构都是相同的,而不同的列之间则可以是不同的数据结构。DataFrame中的每一行是一个记录,名称为Index的一个元素,而每一列则为一个字段,是这个记录的一个属性,DataFrame既有行索引也有列索引。
创建DataFrame
首先来看如何从字典创建DataFrame。

In [6]:

d = {'one': [1, 2, 3], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)
   one  two
a    1    1
b    2    2
c    3    3

  可以使用dataframe.index和dataframe.columns来查看DataFrame的行和列,dataframe.values则以数组的形式返回DataFrame的元素:

In [12]:

print(df.index) #查看行
print(df.columns) #查看列
print(df.values) #查看元素
Index(['a', 'b', 'c'], dtype='object')
Index(['one', 'two'], dtype='object')
[[1 1]
 [2 2]
 [3 3]]

  DataFrame从值是数组的字典创建时,其各个数组的长度需要相同,加强印象,可参考以下报错的例子。

In [13]:

d = {'one': [1, 2], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4308     try:
-> 4309         blocks = form_blocks(arrays, names, axes)
   4310         mgr = BlockManager(blocks, axes)

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in form_blocks(arrays, names, axes)
   4380     if len(int_items):
-> 4381         int_blocks = _multi_blockify(int_items)
   4382         blocks.extend(int_blocks)

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in _multi_blockify(tuples, dtype)
   4449 
-> 4450         values, placement = _stack_arrays(list(tup_block), dtype)
   4451 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in _stack_arrays(tuples, dtype)
   4494     for i, arr in enumerate(arrays):
-> 4495         stacked[i] = _asarray_compat(arr)
   4496 

ValueError: could not broadcast input array from shape (3) into shape (2)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-13-26bdeb89a47c> in <module>()
      1 d = {'one': [1, 2], 'two': [1, 2, 3]}
----> 2 df = pd.DataFrame(d,index=['a', 'b', 'c'])
      3 print(df)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5602     axes = [_ensure_index(columns), _ensure_index(index)]
   5603 
-> 5604     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5605 
   5606 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4312         return mgr
   4313     except ValueError as e:
-> 4314         construction_error(len(arrays), arrays[0].shape, axes, e)
   4315 
   4316 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4278         raise ValueError("Empty data passed with indices specified.")
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 
   4282 

ValueError: Shape of passed values is (2, 2), indices imply (2, 3)

  如果DataFrame的值是非数组时,没有这一限制,且自动将缺失值补成NaN。如下示例

In [14]:

d= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6, 'c': 9}]
df = pd.DataFrame(d)
print(df)
     a  b    c
0  1.6  2  NaN
1  3.0  6  9.0

  在实际处理数据时,有时需要创建一个空的DataFrame,可以这么做:

In [15]:

df = pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []

  另一种创建DataFrame的方法十分有用,那就是使用concat函数创建DataFrame,其主要是通过两个行或列相同的DataFrame链接成一个。

In [16]:

a= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6}]
df1 = pd.DataFrame(a)
b= [{'a': 4, 'b': 5}]
df2 = pd.DataFrame(b)
df = pd.concat([df1, df2], axis=0)
print(df1)
print(df2)
print(df)
     a  b
0  1.6  2
1  3.0  6
   a  b
0  4  5
     a  b
0  1.6  2
1  3.0  6
0  4.0  5

  注意:concat函数内有axis参数,其中的axis=1表示按列进行合并,axis=0表示按行合并

二、数据查看

  MindGO量化交易平台上大部分获取数据的函数,最终以DataFrame或Dict(字典)格式呈现。接下来重点介绍DataFrame格式的数据查看,数据处理。以MindGO平台获取的数据为例进行讲解:
该部分内容需在MindGo研究环境中练习。

In [18]:

# 获取贵州茅台近10个工作日的开盘价、最高价、最低价、收盘价,获取格式即为DataFrame
price= get_price('600519.SH', None, '20180125', '1d', ['open', 'high', 'low', 'close'], False, 'pre', 20, is_panel=1)
print(price)
             close    high     low    open
2017-12-28  718.69  719.90  671.32  687.00
2017-12-29  697.49  726.50  691.60  718.00
2018-01-02  703.85  710.16  689.89  700.00
2018-01-03  715.86  721.40  699.74  701.50
2018-01-04  737.07  743.50  719.33  721.40
2018-01-05  738.36  746.03  728.22  741.00
2018-01-08  752.13  756.50  735.02  735.02
2018-01-09  782.52  783.00  752.21  752.21
2018-01-10  785.71  788.88  773.48  785.00
2018-01-11  774.81  788.00  772.00  787.00
2018-01-12  788.42  788.80  767.02  773.77
2018-01-15  785.37  799.06  779.02  793.46
2018-01-16  772.94  788.61  768.00  780.48
2018-01-17  747.93  774.00  738.51  770.00
2018-01-18  750.74  765.00  744.09  747.93
2018-01-19  750.18  758.90  739.02  752.90
2018-01-22  773.64  774.00  751.81  751.81
2018-01-23  773.78  780.00  768.60  777.81
2018-01-24  764.46  776.46  758.60  776.44
2018-01-25  769.16  776.00  751.00  761.00

  以下为数据查看常用的八项操作:

  1.查看前几条数据:

In [19]:

price.head()

Out[19]:

closehighlowopen
2017-12-28718.69719.90671.32687.0
2017-12-29697.49726.50691.60718.0
2018-01-02703.85710.16689.89700.0
2018-01-03715.86721.40699.74701.5
2018-01-04737.07743.50719.33721.4

  2.查看后几条数据:

In [20]:

price.tail()

Out[20]:

closehighlowopen
2018-01-19750.18758.90739.02752.90
2018-01-22773.64774.00751.81751.81
2018-01-23773.78780.00768.60777.81
2018-01-24764.46776.46758.60776.44
2018-01-25769.16776.00751.00761.00

  3.查看 DataFrame 的索引

In [21]:

price.index

Out[21]:

DatetimeIndex(['2017-12-28', '2017-12-29', '2018-01-02', '2018-01-03',               '2018-01-04', '2018-01-05', '2018-01-08', '2018-01-09',               '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15',               '2018-01-16', '2018-01-17', '2018-01-18', '2018-01-19',               '2018-01-22', '2018-01-23', '2018-01-24', '2018-01-25'],
              dtype='datetime64[ns]', freq=None)

  4.查看 DataFrame 的列名

In [22]:

price.columns

Out[22]:

Index(['close', 'high', 'low', 'open'], dtype='object')

  5.查看 DataFrame 的值

In [23]:

price.values

Out[23]:

array([[ 718.69,  719.9 ,  671.32,  687.  ],
       [ 697.49,  726.5 ,  691.6 ,  718.  ],
       [ 703.85,  710.16,  689.89,  700.  ],
       [ 715.86,  721.4 ,  699.74,  701.5 ],
       [ 737.07,  743.5 ,  719.33,  721.4 ],
       [ 738.36,  746.03,  728.22,  741.  ],
       [ 752.13,  756.5 ,  735.02,  735.02],
       [ 782.52,  783.  ,  752.21,  752.21],
       [ 785.71,  788.88,  773.48,  785.  ],
       [ 774.81,  788.  ,  772.  ,  787.  ],
       [ 788.42,  788.8 ,  767.02,  773.77],
       [ 785.37,  799.06,  779.02,  793.46],
       [ 772.94,  788.61,  768.  ,  780.48],
       [ 747.93,  774.  ,  738.51,  770.  ],
       [ 750.74,  765.  ,  744.09,  747.93],
       [ 750.18,  758.9 ,  739.02,  752.9 ],
       [ 773.64,  774.  ,  751.81,  751.81],
       [ 773.78,  780.  ,  768.6 ,  777.81],
       [ 764.46,  776.46,  758.6 ,  776.44],
       [ 769.16,  776.  ,  751.  ,  761.  ]])

  6.使用 describe() 函数对于数据的快速统计汇总:

In [24]:

price.describe()

Out[24]:

closehighlowopen
count20.00000020.00000020.00000020.000000
mean754.155500763.235000739.924000750.686500
std28.00553926.79400331.25196831.581411
min697.490000710.160000671.320000687.000000
25%738.037500745.397500725.997500731.615000
50%758.295000774.000000747.545000752.555000
75%774.037500784.250000767.265000776.782500
max788.420000799.060000779.020000793.460000

  7.对数据的转置:

In [25]:

price.T

Out[25]:

2017-12-28 00:00:002017-12-29 00:00:002018-01-02 00:00:002018-01-03 00:00:002018-01-04 00:00:002018-01-05 00:00:002018-01-08 00:00:002018-01-09 00:00:002018-01-10 00:00:002018-01-11 00:00:002018-01-12 00:00:002018-01-15 00:00:002018-01-16 00:00:002018-01-17 00:00:002018-01-18 00:00:002018-01-19 00:00:002018-01-22 00:00:002018-01-23 00:00:002018-01-24 00:00:002018-01-25 00:00:00
close718.69697.49703.85715.86737.07738.36752.13782.52785.71774.81788.42785.37772.94747.93750.74750.18773.64773.78764.46769.16
high719.90726.50710.16721.40743.50746.03756.50783.00788.88788.00788.80799.06788.61774.00765.00758.90774.00780.00776.46776.00
low671.32691.60689.89699.74719.33728.22735.02752.21773.48772.00767.02779.02768.00738.51744.09739.02751.81768.60758.60751.00
open687.00718.00700.00701.50721.40741.00735.02752.21785.00787.00773.77793.46780.48770.00747.93752.90751.81777.81776.44761.00

  8.按列对 DataFrame 进行排序

In [26]:

print(price.sort_values(by='open' , ascending=False))
             close    high     low    open
2018-01-15  785.37  799.06  779.02  793.46
2018-01-11  774.81  788.00  772.00  787.00
2018-01-10  785.71  788.88  773.48  785.00
2018-01-16  772.94  788.61  768.00  780.48
2018-01-23  773.78  780.00  768.60  777.81
2018-01-24  764.46  776.46  758.60  776.44
2018-01-12  788.42  788.80  767.02  773.77
2018-01-17  747.93  774.00  738.51  770.00
2018-01-25  769.16  776.00  751.00  761.00
2018-01-19  750.18  758.90  739.02  752.90
2018-01-09  782.52  783.00  752.21  752.21
2018-01-22  773.64  774.00  751.81  751.81
2018-01-18  750.74  765.00  744.09  747.93
2018-01-05  738.36  746.03  728.22  741.00
2018-01-08  752.13  756.50  735.02  735.02
2018-01-04  737.07  743.50  719.33  721.40
2017-12-29  697.49  726.50  691.60  718.00
2018-01-03  715.86  721.40  699.74  701.50
2018-01-02  703.85  710.16  689.89  700.00
2017-12-28  718.69  719.90  671.32  687.00

  注意:sort_values函数内置参数有by和ascending,by参数是排序指定列,ascending是排序顺序,False是从大到小,True是从小到大。

三、选择数据

  依旧采用上个小节案例,继续讲述选择数据的八项基本操作。
1.选择一列数据,选取开盘价这列数据:

In [28]:

price['open']

Out[28]:

2017-12-28    687.00
2017-12-29    718.00
2018-01-02    700.00
2018-01-03    701.50
2018-01-04    721.40
2018-01-05    741.00
2018-01-08    735.02
2018-01-09    752.21
2018-01-10    785.00
2018-01-11    787.00
2018-01-12    773.77
2018-01-15    793.46
2018-01-16    780.48
2018-01-17    770.00
2018-01-18    747.93
2018-01-19    752.90
2018-01-22    751.81
2018-01-23    777.81
2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

  同学们动手试试price.open~
它与price['open']是等效的!

  2.选择多列数据:

In [30]:

price[['open','close']]

Out[30]:

openclose
2017-12-28687.00718.69
2017-12-29718.00697.49
2018-01-02700.00703.85
2018-01-03701.50715.86
2018-01-04721.40737.07
2018-01-05741.00738.36
2018-01-08735.02752.13
2018-01-09752.21782.52
2018-01-10785.00785.71
2018-01-11787.00774.81
2018-01-12773.77788.42
2018-01-15793.46785.37
2018-01-16780.48772.94
2018-01-17770.00747.93
2018-01-18747.93750.74
2018-01-19752.90750.18
2018-01-22751.81773.64
2018-01-23777.81773.78
2018-01-24776.44764.46
2018-01-25761.00769.16

  注意:price[['open','close']]中['open','close']是一个由两个字符串(列名)组成的列表,会自动对应到整个DataFrame表结构中,获取到相应的数据。
同学们试试price['open','close'],看看能不能获取到数据~

  3.选择多行:

In [31]:

price[0:3]

Out[31]:

closehighlowopen
2017-12-28718.69719.90671.32687.0
2017-12-29697.49726.50691.60718.0
2018-01-02703.85710.16689.89700.0

  4.按index选取多行:

In [32]:

price['2018-01-24':'2018-01-25']

Out[32]:

closehighlowopen
2018-01-24764.46776.46758.6776.44
2018-01-25769.16776.00751.0761.00

  5.使用标签选取数据:

   price.loc[行标签,列标签]
price.loc['a':'b'] #选取 ab 两行数据
price.loc[:,'open'] #选取 open 列的数据
price.loc 的第一个参数是行标签,第二个参数为列标签,两个参数既可以是列表也可以是单个字符,如果两个参数都为列表则返回的是 DataFrame,否则,则为 Series。

In [33]:

price.loc['2018-01-24','open']

Out[33]:

776.44000000000005

In [34]:

price.loc['2018-01-24':'2018-01-25']

Out[34]:

closehighlowopen
2018-01-24764.46776.46758.6776.44
2018-01-25769.16776.00751.0761.00

In [35]:

price.loc[:, 'open']

Out[35]:

2017-12-28    687.00
2017-12-29    718.00
2018-01-02    700.00
2018-01-03    701.50
2018-01-04    721.40
2018-01-05    741.00
2018-01-08    735.02
2018-01-09    752.21
2018-01-10    785.00
2018-01-11    787.00
2018-01-12    773.77
2018-01-15    793.46
2018-01-16    780.48
2018-01-17    770.00
2018-01-18    747.93
2018-01-19    752.90
2018-01-22    751.81
2018-01-23    777.81
2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

In [36]:

price.loc['2018-01-24':'2018-01-25','open']

Out[36]:

2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

  6..使用位置选取数据:

   price.iloc[行位置,列位置]
price.iloc[1,1] #选取第二行,第二列的值,返回的为单个值
price.iloc[[0,2],:] #选取第一行及第三行的数据
price.iloc[0:2,:] #选取第一行到第三行(不包含)的数据
price.iloc[:,1] #选取所有记录的第二列的值,返回的为一个Series
price.iloc[1,:] #选取第一行数据,返回的为一个Series

In [38]:

price.iloc[1,1] # 选取第二行,第二列的值,返回的为单个值

Out[38]:

726.5

In [39]:

price.iloc[[0,2],:] # 选取第一行及第三行的数据

Out[39]:

closehighlowopen
2017-12-28718.69719.90671.32687.0
2018-01-02703.85710.16689.89700.0

In [40]:

price.iloc[0:2,:] # 选取第一行到第三行(不包含)的数据

Out[40]:

closehighlowopen
2017-12-28718.69719.9671.32687.0
2017-12-29697.49726.5691.60718.0

In [41]:

price.iloc[:,1] # 选取所有记录的第一列的值,返回的为一个Series

Out[41]:

2017-12-28    719.90
2017-12-29    726.50
2018-01-02    710.16
2018-01-03    721.40
2018-01-04    743.50
2018-01-05    746.03
2018-01-08    756.50
2018-01-09    783.00
2018-01-10    788.88
2018-01-11    788.00
2018-01-12    788.80
2018-01-15    799.06
2018-01-16    788.61
2018-01-17    774.00
2018-01-18    765.00
2018-01-19    758.90
2018-01-22    774.00
2018-01-23    780.00
2018-01-24    776.46
2018-01-25    776.00
Name: high, dtype: float64

In [42]:

price.iloc[1,:] # 选取第一行数据,返回的为一个Series

Out[42]:

close    697.49
high     726.50
low      691.60
open     718.00
Name: 2017-12-29 00:00:00, dtype: float64

  7.更广义的切片方式是使用.ix,它自动根据给到的索引类型判断是使用位置还是标签进行切片

   price.ix[1,1]
price.ix['a':'b']

In [43]:

price.ix[1,1]

Out[43]:

726.5

In [44]:

price.ix['2018-01-24':'2018-01-25']

Out[44]:

closehighlowopen
2018-01-24764.46776.46758.6776.44
2018-01-25769.16776.00751.0761.00

In [45]:

price.ix['2018-01-24','open']

Out[45]:

776.44000000000005

In [46]:

price.ix[1,'open']

Out[46]:

718.0

In [47]:

price.ix['2018-01-24',0]

Out[47]:

764.46000000000004

  8.通过逻辑指针进行数据切片:

   price[逻辑条件]
price[price.one >= 2] #单个逻辑条件
price[(price.one >=1 ) & (df.one < 3) ] #多个逻辑条件组合

In [49]:

#筛选出 open 大于 750的数据
price[price.open > 750]

Out[49]:

closehighlowopen
2018-01-09782.52783.00752.21752.21
2018-01-10785.71788.88773.48785.00
2018-01-11774.81788.00772.00787.00
2018-01-12788.42788.80767.02773.77
2018-01-15785.37799.06779.02793.46
2018-01-16772.94788.61768.00780.48
2018-01-17747.93774.00738.51770.00
2018-01-19750.18758.90739.02752.90
2018-01-22773.64774.00751.81751.81
2018-01-23773.78780.00768.60777.81
2018-01-24764.46776.46758.60776.44
2018-01-25769.16776.00751.00761.00

In [50]:

#筛选出 open 大于 750 的数据,并且 close 小于 770 的数据
price[(price.open > 750) & (price.close < 770)]

Out[50]:

closehighlowopen
2018-01-17747.93774.00738.51770.00
2018-01-19750.18758.90739.02752.90
2018-01-24764.46776.46758.60776.44
2018-01-25769.16776.00751.00761.00

In [51]:

#使用 条件过来更改数据。
price[price>780]

Out[51]:

closehighlowopen
2017-12-28NaNNaNNaNNaN
2017-12-29NaNNaNNaNNaN
2018-01-02NaNNaNNaNNaN
2018-01-03NaNNaNNaNNaN
2018-01-04NaNNaNNaNNaN
2018-01-05NaNNaNNaNNaN
2018-01-08NaNNaNNaNNaN
2018-01-09782.52783.00NaNNaN
2018-01-10785.71788.88NaN785.00
2018-01-11NaN788.00NaN787.00
2018-01-12788.42788.80NaNNaN
2018-01-15785.37799.06NaN793.46
2018-01-16NaN788.61NaN780.48
2018-01-17NaNNaNNaNNaN
2018-01-18NaNNaNNaNNaN
2018-01-19NaNNaNNaNNaN
2018-01-22NaNNaNNaNNaN
2018-01-23NaNNaNNaNNaN
2018-01-24NaNNaNNaNNaN
2018-01-25NaNNaNNaNNaN

  观察可以发现,price 中小于等于 780 的数都变为 NaN。

In [54]:

#我们还可以把大于 780 的数赋值为1.
price[price > 780] = 1
price

Out[54]:

closehighlowopen
2017-12-28718.69719.90671.32687.00
2017-12-29697.49726.50691.60718.00
2018-01-02703.85710.16689.89700.00
2018-01-03715.86721.40699.74701.50
2018-01-04737.07743.50719.33721.40
2018-01-05738.36746.03728.22741.00
2018-01-08752.13756.50735.02735.02
2018-01-091.001.00752.21752.21
2018-01-101.001.00773.481.00
2018-01-11774.811.00772.001.00
2018-01-121.001.00767.02773.77
2018-01-151.001.00779.021.00
2018-01-16772.941.00768.001.00
2018-01-17747.93774.00738.51770.00
2018-01-18750.74765.00744.09747.93
2018-01-19750.18758.90739.02752.90
2018-01-22773.64774.00751.81751.81
2018-01-23773.78780.00768.60777.81
2018-01-24764.46776.46758.60776.44
2018-01-25769.16776.00751.00761.00

  使用isin()方法来过滤在指定列中的数据,案例延续上面赋值后的price

In [55]:

# 选取 high 列中数为 1774.00的数。
price[price['high'].isin([1,774.00])]

Out[55]:

closehighlowopen
2018-01-091.001.0752.21752.21
2018-01-101.001.0773.481.00
2018-01-11774.811.0772.001.00
2018-01-121.001.0767.02773.77
2018-01-151.001.0779.021.00
2018-01-16772.941.0768.001.00
2018-01-17747.93774.0738.51770.00
2018-01-22773.64774.0751.81751.81

四、 Panel

  MindGo量化交易平台的get_price函数,如果是获取多支股票数据, 则返回pandas.Panel对象。pane其实就是一张一张DataFrame整合。

In [72]:

# 获取贵州茅台,招商银行,中信证券这三只股票近10个工作日的开盘价、最高价、最低价、收盘价,获取格式即为DataFrame
price= get_price(['600519.SH','600036.SH','600030.SH'], None, '20180125', '1d', ['open', 'high', 'low', 'close'], False, 'pre', 20,is_panel=1)
print(price)
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 20 (major_axis) x 3 (minor_axis)
Items axis: close to open
Major_axis axis: 2017-12-28 00:00:00 to 2018-01-25 00:00:00
Minor_axis axis: 600030.SH to 600519.SH

  注意:现在这个price不是一张DataFrame,而是四个对应字段的DataFrame了,那么我们需要通过数据字段下标,来分别获取多张DataFrame,之后的操作就是操作单张DataFrame了。

In [73]:

price['close']#获取三个股票的收盘价,注意获取后是个DataFrame

Out[73]:

600030.SH600036.SH600519.SH
2017-12-2818.1228.63718.69
2017-12-2918.1029.02697.49
2018-01-0218.4429.62703.85
2018-01-0318.6129.97715.86
2018-01-0418.6729.65737.07
2018-01-0518.8830.10738.36
2018-01-0819.5429.47752.13
2018-01-0919.4429.77782.52
2018-01-1019.6130.53785.71
2018-01-1119.2830.92774.81
2018-01-1219.3331.51788.42
2018-01-1519.4531.94785.37
2018-01-1620.2531.89772.94
2018-01-1720.9431.69747.93
2018-01-1821.4132.32750.74
2018-01-1921.2932.46750.18
2018-01-2221.2033.08773.64
2018-01-2321.2134.05773.78
2018-01-2422.9233.85764.46
2018-01-2522.3333.41769.16

In [75]:

print(price['open'])#获取开盘价股票数据,注意获取的还是DataFrame
            600030.SH  600036.SH  600519.SH
2017-12-28      18.06      28.75     687.00
2017-12-29      18.12      28.63     718.00
2018-01-02      18.13      29.02     700.00
2018-01-03      18.36      29.74     701.50
2018-01-04      18.64      30.28     721.40
2018-01-05      18.68      29.87     741.00
2018-01-08      19.00      29.92     735.02
2018-01-09      19.55      29.52     752.21
2018-01-10      19.47      29.66     785.00
2018-01-11      19.46      30.52     787.00
2018-01-12      19.25      31.12     773.77
2018-01-15      19.25      31.48     793.46
2018-01-16      19.26      31.80     780.48
2018-01-17      20.50      32.10     770.00
2018-01-18      21.15      32.10     747.93
2018-01-19      21.36      32.66     752.90
2018-01-22      21.10      32.18     751.81
2018-01-23      21.37      33.20     777.81
2018-01-24      21.40      34.25     776.44
2018-01-25      22.50      34.01     761.00

In [59]:

price['2018-01-11']#获取2018-01-11日期的三个股票的数据
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2392             try:
-> 2393                 return self._engine.get_loc(key)
   2394             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)()

KeyError: '2018-01-11'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-59-dddd7b500893> in <module>()
----> 1 price['2018-01-11']#获取2018-01-11日期的三个股票的数据

/opt/conda/lib/python3.5/site-packages/pandas/core/panel.py in __getitem__(self, key)
    284             return self._getitem_multilevel(key)
    285         if not (is_list_like(key) or isinstance(key, slice)):
--> 286             return super(Panel, self).__getitem__(key)
    287         return self.loc[key]
    288 

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __getitem__(self, item)
   1525 
   1526     def __getitem__(self, item):
-> 1527         return self._get_item_cache(item)
   1528 
   1529     def _get_item_cache(self, item):

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1532         res = cache.get(item)
   1533         if res is None:
-> 1534             values = self._data.get(item)
   1535             res = self._box_item_values(item, values)
   1536             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2393                 return self._engine.get_loc(key)
   2394             except KeyError:
-> 2395                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2396 
   2397         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)()

KeyError: '2018-01-11'

  注意这是不可行的,思考下为什么无法指定日期?

查看以上策略详情请到supermind量化交易官网查看同花顺Supermind量化交易 Python基础编程--pandas基础