同花顺Supermind量化交易 Python基础编程--pandas基础pandas 是基于 Numpy 构建的，让以

pandas 是基于 Numpy 构建的，让以 Numpy 为中心的应用变得更加简单。pandas是公认的数据处理利器，本章内容主要介绍DataFrame数据结构，在此基础上进行数据处理。除了DataFrame格式，pandas 还包括series、Panel。

第六节：pandas基础

格式	数组	释义
Series	一维数组	与Numpy中的一维array类似。
DataFrame	二维的表格型数据结构	可以将DataFrame理解为Series的容器
Panel	三维的数组	可以理解为DataFrame的容器

　　开始之前，我们首先掌握导入pandas库，方式如下：

In [ ]:

import pandas as pd

　　注意：以下内容必须在导入pandas库之后才能运行。

一、Series和DataFrame介绍

　　　1.Series
由一组数据和与之相关的索引组成。可通过传递一个list对象来创建一个Series，pandas会默认创建整型索引。
创建一个Series：

In [2]:

s = pd.Series([1,3,5,7,6,8])
print(s)

0    1
1    3
2    5
3    7
4    6
5    8
dtype: int64

　　获取 Series 的索引:

In [3]:

s.index

Out[3]:

RangeIndex(start=0, stop=6, step=1)

　　2.DataFrame
DataFrame是一个表格型的数据结构，它含有一组有序的列，每一列的数据结构都是相同的，而不同的列之间则可以是不同的数据结构。DataFrame中的每一行是一个记录，名称为Index的一个元素，而每一列则为一个字段，是这个记录的一个属性，DataFrame既有行索引也有列索引。
创建DataFrame
首先来看如何从字典创建DataFrame。

In [6]:

d = {'one': [1, 2, 3], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)

   one  two
a    1    1
b    2    2
c    3    3

　　可以使用dataframe.index和dataframe.columns来查看DataFrame的行和列，dataframe.values则以数组的形式返回DataFrame的元素：

In [12]:

print(df.index) #查看行
print(df.columns) #查看列
print(df.values) #查看元素

Index(['a', 'b', 'c'], dtype='object')
Index(['one', 'two'], dtype='object')
[[1 1]
 [2 2]
 [3 3]]

　　DataFrame从值是数组的字典创建时，其各个数组的长度需要相同，加强印象，可参考以下报错的例子。

In [13]:

d = {'one': [1, 2], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4308     try:
-> 4309         blocks = form_blocks(arrays, names, axes)
   4310         mgr = BlockManager(blocks, axes)

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in form_blocks(arrays, names, axes)
   4380     if len(int_items):
-> 4381         int_blocks = _multi_blockify(int_items)
   4382         blocks.extend(int_blocks)

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in _multi_blockify(tuples, dtype)
   4449 
-> 4450         values, placement = _stack_arrays(list(tup_block), dtype)
   4451 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in _stack_arrays(tuples, dtype)
   4494     for i, arr in enumerate(arrays):
-> 4495         stacked[i] = _asarray_compat(arr)
   4496 

ValueError: could not broadcast input array from shape (3) into shape (2)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-13-26bdeb89a47c> in <module>()
      1 d = {'one': [1, 2], 'two': [1, 2, 3]}
----> 2 df = pd.DataFrame(d,index=['a', 'b', 'c'])
      3 print(df)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5602     axes = [_ensure_index(columns), _ensure_index(index)]
   5603 
-> 5604     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5605 
   5606 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4312         return mgr
   4313     except ValueError as e:
-> 4314         construction_error(len(arrays), arrays[0].shape, axes, e)
   4315 
   4316 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4278         raise ValueError("Empty data passed with indices specified.")
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 
   4282 

ValueError: Shape of passed values is (2, 2), indices imply (2, 3)

　　如果DataFrame的值是非数组时，没有这一限制，且自动将缺失值补成NaN。如下示例

In [14]:

d= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6, 'c': 9}]
df = pd.DataFrame(d)
print(df)

     a  b    c
0  1.6  2  NaN
1  3.0  6  9.0

　　在实际处理数据时，有时需要创建一个空的DataFrame，可以这么做：

In [15]:

df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []

　　另一种创建DataFrame的方法十分有用，那就是使用concat函数创建DataFrame，其主要是通过两个行或列相同的DataFrame链接成一个。

In [16]:

a= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6}]
df1 = pd.DataFrame(a)
b= [{'a': 4, 'b': 5}]
df2 = pd.DataFrame(b)
df = pd.concat([df1, df2], axis=0)
print(df1)
print(df2)
print(df)

　　注意：concat函数内有axis参数，其中的axis=1表示按列进行合并，axis=0表示按行合并

二、数据查看

　　MindGO量化交易平台上大部分获取数据的函数，最终以DataFrame或Dict(字典)格式呈现。接下来重点介绍DataFrame格式的数据查看，数据处理。以MindGO平台获取的数据为例进行讲解：
该部分内容需在MindGo研究环境中练习。

In [18]:

# 获取贵州茅台近10个工作日的开盘价、最高价、最低价、收盘价，获取格式即为DataFrame
price= get_price('600519.SH', None, '20180125', '1d', ['open', 'high', 'low', 'close'], False, 'pre', 20, is_panel=1)
print(price)

             close    high     low    open
2017-12-28  718.69  719.90  671.32  687.00
2017-12-29  697.49  726.50  691.60  718.00
2018-01-02  703.85  710.16  689.89  700.00
2018-01-03  715.86  721.40  699.74  701.50
2018-01-04  737.07  743.50  719.33  721.40
2018-01-05  738.36  746.03  728.22  741.00
2018-01-08  752.13  756.50  735.02  735.02
2018-01-09  782.52  783.00  752.21  752.21
2018-01-10  785.71  788.88  773.48  785.00
2018-01-11  774.81  788.00  772.00  787.00
2018-01-12  788.42  788.80  767.02  773.77
2018-01-15  785.37  799.06  779.02  793.46
2018-01-16  772.94  788.61  768.00  780.48
2018-01-17  747.93  774.00  738.51  770.00
2018-01-18  750.74  765.00  744.09  747.93
2018-01-19  750.18  758.90  739.02  752.90
2018-01-22  773.64  774.00  751.81  751.81
2018-01-23  773.78  780.00  768.60  777.81
2018-01-24  764.46  776.46  758.60  776.44
2018-01-25  769.16  776.00  751.00  761.00

　　以下为数据查看常用的八项操作：

　　1.查看前几条数据：

In [19]:

price.head()

Out[19]:

	close	high	low	open
2017-12-28	718.69	719.90	671.32	687.0
2017-12-29	697.49	726.50	691.60	718.0
2018-01-02	703.85	710.16	689.89	700.0
2018-01-03	715.86	721.40	699.74	701.5
2018-01-04	737.07	743.50	719.33	721.4

　　2.查看后几条数据：

In [20]:

price.tail()

Out[20]:

	close	high	low	open
2018-01-19	750.18	758.90	739.02	752.90
2018-01-22	773.64	774.00	751.81	751.81
2018-01-23	773.78	780.00	768.60	777.81
2018-01-24	764.46	776.46	758.60	776.44
2018-01-25	769.16	776.00	751.00	761.00

　　3.查看 DataFrame 的索引

In [21]:

price.index

Out[21]:

DatetimeIndex(['2017-12-28', '2017-12-29', '2018-01-02', '2018-01-03',               '2018-01-04', '2018-01-05', '2018-01-08', '2018-01-09',               '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15',               '2018-01-16', '2018-01-17', '2018-01-18', '2018-01-19',               '2018-01-22', '2018-01-23', '2018-01-24', '2018-01-25'],
              dtype='datetime64[ns]', freq=None)

　　4.查看 DataFrame 的列名

In [22]:

price.columns

Out[22]:

Index(['close', 'high', 'low', 'open'], dtype='object')

　　5.查看 DataFrame 的值

In [23]:

price.values

Out[23]:

array([[ 718.69,  719.9 ,  671.32,  687.  ],
       [ 697.49,  726.5 ,  691.6 ,  718.  ],
       [ 703.85,  710.16,  689.89,  700.  ],
       [ 715.86,  721.4 ,  699.74,  701.5 ],
       [ 737.07,  743.5 ,  719.33,  721.4 ],
       [ 738.36,  746.03,  728.22,  741.  ],
       [ 752.13,  756.5 ,  735.02,  735.02],
       [ 782.52,  783.  ,  752.21,  752.21],
       [ 785.71,  788.88,  773.48,  785.  ],
       [ 774.81,  788.  ,  772.  ,  787.  ],
       [ 788.42,  788.8 ,  767.02,  773.77],
       [ 785.37,  799.06,  779.02,  793.46],
       [ 772.94,  788.61,  768.  ,  780.48],
       [ 747.93,  774.  ,  738.51,  770.  ],
       [ 750.74,  765.  ,  744.09,  747.93],
       [ 750.18,  758.9 ,  739.02,  752.9 ],
       [ 773.64,  774.  ,  751.81,  751.81],
       [ 773.78,  780.  ,  768.6 ,  777.81],
       [ 764.46,  776.46,  758.6 ,  776.44],
       [ 769.16,  776.  ,  751.  ,  761.  ]])

　　6.使用 describe() 函数对于数据的快速统计汇总：

In [24]:

price.describe()

Out[24]:

	close	high	low	open
count	20.000000	20.000000	20.000000	20.000000
mean	754.155500	763.235000	739.924000	750.686500
std	28.005539	26.794003	31.251968	31.581411
min	697.490000	710.160000	671.320000	687.000000
25%	738.037500	745.397500	725.997500	731.615000
50%	758.295000	774.000000	747.545000	752.555000
75%	774.037500	784.250000	767.265000	776.782500
max	788.420000	799.060000	779.020000	793.460000

　　7.对数据的转置：

In [25]:

price.T

Out[25]:

	2017-12-28 00:00:00	2017-12-29 00:00:00	2018-01-02 00:00:00	2018-01-03 00:00:00	2018-01-04 00:00:00	2018-01-05 00:00:00	2018-01-08 00:00:00	2018-01-09 00:00:00	2018-01-10 00:00:00	2018-01-11 00:00:00	2018-01-12 00:00:00	2018-01-15 00:00:00	2018-01-16 00:00:00	2018-01-17 00:00:00	2018-01-18 00:00:00	2018-01-19 00:00:00	2018-01-22 00:00:00	2018-01-23 00:00:00	2018-01-24 00:00:00	2018-01-25 00:00:00
close	718.69	697.49	703.85	715.86	737.07	738.36	752.13	782.52	785.71	774.81	788.42	785.37	772.94	747.93	750.74	750.18	773.64	773.78	764.46	769.16
high	719.90	726.50	710.16	721.40	743.50	746.03	756.50	783.00	788.88	788.00	788.80	799.06	788.61	774.00	765.00	758.90	774.00	780.00	776.46	776.00
low	671.32	691.60	689.89	699.74	719.33	728.22	735.02	752.21	773.48	772.00	767.02	779.02	768.00	738.51	744.09	739.02	751.81	768.60	758.60	751.00
open	687.00	718.00	700.00	701.50	721.40	741.00	735.02	752.21	785.00	787.00	773.77	793.46	780.48	770.00	747.93	752.90	751.81	777.81	776.44	761.00

　　8.按列对 DataFrame 进行排序

In [26]:

print(price.sort_values(by='open' , ascending=False))

             close    high     low    open
2018-01-15  785.37  799.06  779.02  793.46
2018-01-11  774.81  788.00  772.00  787.00
2018-01-10  785.71  788.88  773.48  785.00
2018-01-16  772.94  788.61  768.00  780.48
2018-01-23  773.78  780.00  768.60  777.81
2018-01-24  764.46  776.46  758.60  776.44
2018-01-12  788.42  788.80  767.02  773.77
2018-01-17  747.93  774.00  738.51  770.00
2018-01-25  769.16  776.00  751.00  761.00
2018-01-19  750.18  758.90  739.02  752.90
2018-01-09  782.52  783.00  752.21  752.21
2018-01-22  773.64  774.00  751.81  751.81
2018-01-18  750.74  765.00  744.09  747.93
2018-01-05  738.36  746.03  728.22  741.00
2018-01-08  752.13  756.50  735.02  735.02
2018-01-04  737.07  743.50  719.33  721.40
2017-12-29  697.49  726.50  691.60  718.00
2018-01-03  715.86  721.40  699.74  701.50
2018-01-02  703.85  710.16  689.89  700.00
2017-12-28  718.69  719.90  671.32  687.00

　　注意：sort_values函数内置参数有by和ascending，by参数是排序指定列，ascending是排序顺序，False是从大到小，True是从小到大。

三、选择数据

　　依旧采用上个小节案例，继续讲述选择数据的八项基本操作。
1.选择一列数据，选取开盘价这列数据：

In [28]:

price['open']

Out[28]:

2017-12-28    687.00
2017-12-29    718.00
2018-01-02    700.00
2018-01-03    701.50
2018-01-04    721.40
2018-01-05    741.00
2018-01-08    735.02
2018-01-09    752.21
2018-01-10    785.00
2018-01-11    787.00
2018-01-12    773.77
2018-01-15    793.46
2018-01-16    780.48
2018-01-17    770.00
2018-01-18    747.93
2018-01-19    752.90
2018-01-22    751.81
2018-01-23    777.81
2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

　　同学们动手试试price.open~
它与price['open']是等效的！

　　2.选择多列数据：

In [30]:

price[['open','close']]

Out[30]:

	open	close
2017-12-28	687.00	718.69
2017-12-29	718.00	697.49
2018-01-02	700.00	703.85
2018-01-03	701.50	715.86
2018-01-04	721.40	737.07
2018-01-05	741.00	738.36
2018-01-08	735.02	752.13
2018-01-09	752.21	782.52
2018-01-10	785.00	785.71
2018-01-11	787.00	774.81
2018-01-12	773.77	788.42
2018-01-15	793.46	785.37
2018-01-16	780.48	772.94
2018-01-17	770.00	747.93
2018-01-18	747.93	750.74
2018-01-19	752.90	750.18
2018-01-22	751.81	773.64
2018-01-23	777.81	773.78
2018-01-24	776.44	764.46
2018-01-25	761.00	769.16

　　注意：price[['open','close']]中['open','close']是一个由两个字符串（列名）组成的列表，会自动对应到整个DataFrame表结构中，获取到相应的数据。
同学们试试price['open','close']，看看能不能获取到数据~

　　3.选择多行：

In [31]:

price[0:3]

Out[31]:

	close	high	low	open
2017-12-28	718.69	719.90	671.32	687.0
2017-12-29	697.49	726.50	691.60	718.0
2018-01-02	703.85	710.16	689.89	700.0

　　4.按index选取多行:

In [32]:

price['2018-01-24':'2018-01-25']

Out[32]:

	close	high	low	open
2018-01-24	764.46	776.46	758.6	776.44
2018-01-25	769.16	776.00	751.0	761.00

　　5.使用标签选取数据：

　　　price.loc[行标签,列标签]
price.loc['a':'b'] #选取 ab 两行数据
price.loc[:,'open'] #选取 open 列的数据
price.loc 的第一个参数是行标签，第二个参数为列标签，两个参数既可以是列表也可以是单个字符，如果两个参数都为列表则返回的是 DataFrame，否则，则为 Series。

In [33]:

price.loc['2018-01-24','open']

Out[33]:

776.44000000000005

In [34]:

price.loc['2018-01-24':'2018-01-25']

Out[34]:

	close	high	low	open
2018-01-24	764.46	776.46	758.6	776.44
2018-01-25	769.16	776.00	751.0	761.00

In [35]:

price.loc[:, 'open']

Out[35]:

2017-12-28    687.00
2017-12-29    718.00
2018-01-02    700.00
2018-01-03    701.50
2018-01-04    721.40
2018-01-05    741.00
2018-01-08    735.02
2018-01-09    752.21
2018-01-10    785.00
2018-01-11    787.00
2018-01-12    773.77
2018-01-15    793.46
2018-01-16    780.48
2018-01-17    770.00
2018-01-18    747.93
2018-01-19    752.90
2018-01-22    751.81
2018-01-23    777.81
2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

In [36]:

price.loc['2018-01-24':'2018-01-25','open']

Out[36]:

2018-01-24    776.44
2018-01-25    761.00
Name: open, dtype: float64

　　6..使用位置选取数据：

　　　price.iloc[行位置,列位置]
price.iloc[1,1] #选取第二行，第二列的值，返回的为单个值
price.iloc[[0,2],:] #选取第一行及第三行的数据
price.iloc[0:2,:] #选取第一行到第三行（不包含）的数据
price.iloc[:,1] #选取所有记录的第二列的值，返回的为一个Series
price.iloc[1,:] #选取第一行数据，返回的为一个Series

In [38]:

price.iloc[1,1] # 选取第二行，第二列的值，返回的为单个值

Out[38]:

726.5

In [39]:

price.iloc[[0,2],:] # 选取第一行及第三行的数据

Out[39]:

	close	high	low	open
2017-12-28	718.69	719.90	671.32	687.0
2018-01-02	703.85	710.16	689.89	700.0

In [40]:

price.iloc[0:2,:] # 选取第一行到第三行（不包含）的数据

Out[40]:

	close	high	low	open
2017-12-28	718.69	719.9	671.32	687.0
2017-12-29	697.49	726.5	691.60	718.0

In [41]:

price.iloc[:,1] # 选取所有记录的第一列的值，返回的为一个Series

Out[41]:

2017-12-28    719.90
2017-12-29    726.50
2018-01-02    710.16
2018-01-03    721.40
2018-01-04    743.50
2018-01-05    746.03
2018-01-08    756.50
2018-01-09    783.00
2018-01-10    788.88
2018-01-11    788.00
2018-01-12    788.80
2018-01-15    799.06
2018-01-16    788.61
2018-01-17    774.00
2018-01-18    765.00
2018-01-19    758.90
2018-01-22    774.00
2018-01-23    780.00
2018-01-24    776.46
2018-01-25    776.00
Name: high, dtype: float64

In [42]:

price.iloc[1,:] # 选取第一行数据，返回的为一个Series

Out[42]:

close    697.49
high     726.50
low      691.60
open     718.00
Name: 2017-12-29 00:00:00, dtype: float64

　　7.更广义的切片方式是使用.ix，它自动根据给到的索引类型判断是使用位置还是标签进行切片

　　　price.ix[1,1]
price.ix['a':'b']

In [43]:

price.ix[1,1]

Out[43]:

726.5

In [44]:

price.ix['2018-01-24':'2018-01-25']

Out[44]:

	close	high	low	open
2018-01-24	764.46	776.46	758.6	776.44
2018-01-25	769.16	776.00	751.0	761.00

In [45]:

price.ix['2018-01-24','open']

Out[45]:

776.44000000000005

In [46]:

price.ix[1,'open']

Out[46]:

718.0

In [47]:

price.ix['2018-01-24',0]

Out[47]:

764.46000000000004

　　8.通过逻辑指针进行数据切片：

　　　price[逻辑条件]
price[price.one >= 2] #单个逻辑条件
price[(price.one >=1 ) & (df.one < 3) ] #多个逻辑条件组合

In [49]:

#筛选出 open 大于 750的数据
price[price.open > 750]

Out[49]:

	close	high	low	open
2018-01-09	782.52	783.00	752.21	752.21
2018-01-10	785.71	788.88	773.48	785.00
2018-01-11	774.81	788.00	772.00	787.00
2018-01-12	788.42	788.80	767.02	773.77
2018-01-15	785.37	799.06	779.02	793.46
2018-01-16	772.94	788.61	768.00	780.48
2018-01-17	747.93	774.00	738.51	770.00
2018-01-19	750.18	758.90	739.02	752.90
2018-01-22	773.64	774.00	751.81	751.81
2018-01-23	773.78	780.00	768.60	777.81
2018-01-24	764.46	776.46	758.60	776.44
2018-01-25	769.16	776.00	751.00	761.00

In [50]:

#筛选出 open 大于 750 的数据，并且 close 小于 770 的数据
price[(price.open > 750) & (price.close < 770)]

Out[50]:

	close	high	low	open
2018-01-17	747.93	774.00	738.51	770.00
2018-01-19	750.18	758.90	739.02	752.90
2018-01-24	764.46	776.46	758.60	776.44
2018-01-25	769.16	776.00	751.00	761.00

In [51]:

#使用 条件过来更改数据。
price[price>780]

Out[51]:

	close	high	low	open
2017-12-28	NaN	NaN	NaN	NaN
2017-12-29	NaN	NaN	NaN	NaN
2018-01-02	NaN	NaN	NaN	NaN
2018-01-03	NaN	NaN	NaN	NaN
2018-01-04	NaN	NaN	NaN	NaN
2018-01-05	NaN	NaN	NaN	NaN
2018-01-08	NaN	NaN	NaN	NaN
2018-01-09	782.52	783.00	NaN	NaN
2018-01-10	785.71	788.88	NaN	785.00
2018-01-11	NaN	788.00	NaN	787.00
2018-01-12	788.42	788.80	NaN	NaN
2018-01-15	785.37	799.06	NaN	793.46
2018-01-16	NaN	788.61	NaN	780.48
2018-01-17	NaN	NaN	NaN	NaN
2018-01-18	NaN	NaN	NaN	NaN
2018-01-19	NaN	NaN	NaN	NaN
2018-01-22	NaN	NaN	NaN	NaN
2018-01-23	NaN	NaN	NaN	NaN
2018-01-24	NaN	NaN	NaN	NaN
2018-01-25	NaN	NaN	NaN	NaN

　　观察可以发现，price 中小于等于 780 的数都变为 NaN。

In [54]:

#我们还可以把大于 780 的数赋值为1.
price[price > 780] = 1
price

Out[54]:

	close	high	low	open
2017-12-28	718.69	719.90	671.32	687.00
2017-12-29	697.49	726.50	691.60	718.00
2018-01-02	703.85	710.16	689.89	700.00
2018-01-03	715.86	721.40	699.74	701.50
2018-01-04	737.07	743.50	719.33	721.40
2018-01-05	738.36	746.03	728.22	741.00
2018-01-08	752.13	756.50	735.02	735.02
2018-01-09	1.00	1.00	752.21	752.21
2018-01-10	1.00	1.00	773.48	1.00
2018-01-11	774.81	1.00	772.00	1.00
2018-01-12	1.00	1.00	767.02	773.77
2018-01-15	1.00	1.00	779.02	1.00
2018-01-16	772.94	1.00	768.00	1.00
2018-01-17	747.93	774.00	738.51	770.00
2018-01-18	750.74	765.00	744.09	747.93
2018-01-19	750.18	758.90	739.02	752.90
2018-01-22	773.64	774.00	751.81	751.81
2018-01-23	773.78	780.00	768.60	777.81
2018-01-24	764.46	776.46	758.60	776.44
2018-01-25	769.16	776.00	751.00	761.00

　　使用isin()方法来过滤在指定列中的数据,案例延续上面赋值后的price

In [55]:

# 选取 high 列中数为 1 和 774.00的数。
price[price['high'].isin([1,774.00])]

Out[55]:

	close	high	low	open
2018-01-09	1.00	1.0	752.21	752.21
2018-01-10	1.00	1.0	773.48	1.00
2018-01-11	774.81	1.0	772.00	1.00
2018-01-12	1.00	1.0	767.02	773.77
2018-01-15	1.00	1.0	779.02	1.00
2018-01-16	772.94	1.0	768.00	1.00
2018-01-17	747.93	774.0	738.51	770.00
2018-01-22	773.64	774.0	751.81	751.81

四、 Panel

　　MindGo量化交易平台的get_price函数，如果是获取多支股票数据, 则返回pandas.Panel对象。pane其实就是一张一张DataFrame整合。

In [72]:

# 获取贵州茅台，招商银行，中信证券这三只股票近10个工作日的开盘价、最高价、最低价、收盘价，获取格式即为DataFrame
price= get_price(['600519.SH','600036.SH','600030.SH'], None, '20180125', '1d', ['open', 'high', 'low', 'close'], False, 'pre', 20,is_panel=1)
print(price)

<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 20 (major_axis) x 3 (minor_axis)
Items axis: close to open
Major_axis axis: 2017-12-28 00:00:00 to 2018-01-25 00:00:00
Minor_axis axis: 600030.SH to 600519.SH

　　注意：现在这个price不是一张DataFrame，而是四个对应字段的DataFrame了，那么我们需要通过数据字段下标，来分别获取多张DataFrame，之后的操作就是操作单张DataFrame了。

In [73]:

price['close']#获取三个股票的收盘价，注意获取后是个DataFrame

Out[73]:

	600030.SH	600036.SH	600519.SH
2017-12-28	18.12	28.63	718.69
2017-12-29	18.10	29.02	697.49
2018-01-02	18.44	29.62	703.85
2018-01-03	18.61	29.97	715.86
2018-01-04	18.67	29.65	737.07
2018-01-05	18.88	30.10	738.36
2018-01-08	19.54	29.47	752.13
2018-01-09	19.44	29.77	782.52
2018-01-10	19.61	30.53	785.71
2018-01-11	19.28	30.92	774.81
2018-01-12	19.33	31.51	788.42
2018-01-15	19.45	31.94	785.37
2018-01-16	20.25	31.89	772.94
2018-01-17	20.94	31.69	747.93
2018-01-18	21.41	32.32	750.74
2018-01-19	21.29	32.46	750.18
2018-01-22	21.20	33.08	773.64
2018-01-23	21.21	34.05	773.78
2018-01-24	22.92	33.85	764.46
2018-01-25	22.33	33.41	769.16

In [75]:

print(price['open'])#获取开盘价股票数据，注意获取的还是DataFrame

            600030.SH  600036.SH  600519.SH
2017-12-28      18.06      28.75     687.00
2017-12-29      18.12      28.63     718.00
2018-01-02      18.13      29.02     700.00
2018-01-03      18.36      29.74     701.50
2018-01-04      18.64      30.28     721.40
2018-01-05      18.68      29.87     741.00
2018-01-08      19.00      29.92     735.02
2018-01-09      19.55      29.52     752.21
2018-01-10      19.47      29.66     785.00
2018-01-11      19.46      30.52     787.00
2018-01-12      19.25      31.12     773.77
2018-01-15      19.25      31.48     793.46
2018-01-16      19.26      31.80     780.48
2018-01-17      20.50      32.10     770.00
2018-01-18      21.15      32.10     747.93
2018-01-19      21.36      32.66     752.90
2018-01-22      21.10      32.18     751.81
2018-01-23      21.37      33.20     777.81
2018-01-24      21.40      34.25     776.44
2018-01-25      22.50      34.01     761.00

In [59]:

price['2018-01-11']#获取2018-01-11日期的三个股票的数据

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2392             try:
-> 2393                 return self._engine.get_loc(key)
   2394             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)()

KeyError: '2018-01-11'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-59-dddd7b500893> in <module>()
----> 1 price['2018-01-11']#获取2018-01-11日期的三个股票的数据

/opt/conda/lib/python3.5/site-packages/pandas/core/panel.py in __getitem__(self, key)
    284             return self._getitem_multilevel(key)
    285         if not (is_list_like(key) or isinstance(key, slice)):
--> 286             return super(Panel, self).__getitem__(key)
    287         return self.loc[key]
    288 

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __getitem__(self, item)
   1525 
   1526     def __getitem__(self, item):
-> 1527         return self._get_item_cache(item)
   1528 
   1529     def _get_item_cache(self, item):

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1532         res = cache.get(item)
   1533         if res is None:
-> 1534             values = self._data.get(item)
   1535             res = self._box_item_values(item, values)
   1536             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2393                 return self._engine.get_loc(key)
   2394             except KeyError:
-> 2395                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2396 
   2397         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5239)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20405)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20359)()

KeyError: '2018-01-11'

　　注意这是不可行的，思考下为什么无法指定日期？

查看以上策略详情请到supermind量化交易官网查看同花顺Supermind量化交易 Python基础编程--pandas基础