利用pandas_ta自动提取技术面特征

1,781 阅读2分钟

TA-Lib是一个技术分析库,涵盖了150多种股票、期货交易软件中常用的技术分析指标,如MACD、RSI、KDJ、动量指标、布林带等等,而pandas-ta则是一个基于pandas和ta-lib的高级技术分析工具,具有​​130多个指标和实用功能以及60多个TA-Lib中包含的蜡烛模式。本章节记录如何利用pandas-ta快速提取技术面特征。

安装

安装pandas-ta本身非常简单,直接pip一下就可以,如果想用ta-lib的一些特性,则还需要安装ta-lib本身

pandas-ta提供的主要函数有:

  • 蜡烛图形态:Candles ,基于K线图的形态识别,如三只乌鸦等
  • 周期特征:Cycles,如正弦波拟合等
  • 动量特征:Momentum,如KDJ、RSI等
  • 覆盖特征:Overlap,主要包含各种移动平均线系列,如EMA指数平滑
  • 回报特征:Performance ,如百分比回报、log回报等
  • 统计特征:Statistics,如熵、中位数、分位数、标准差等
  • 趋势特征:Trend, 如阿隆指数等
  • 波动率特征:Volatility,如布林带等
  • 成交量特征:Volume,如资金流指数等
  • 其他特征:如神奇九转(td_seq)等

准备K线数据

这里主要提取1min和1d两种粒度数据中的特征,日内1分钟数据未除权,日间1天数据使用后复权,保持序列的连贯性。

读取日内K线数据(未复权)

这里以聚宽数据为例,读取指定时间区间中1min数据,并生成对应的日内收益标签,详细功能见代码中的注释。

主要的收益计算为日内指标,分为有两类:

  • 日内剩余时间段里的平均收益百分比
  • 日内随后5min、10min的平均收益
# 日内:聚宽
def load_jqdata_kline(code='510050.SH', start_date='2021-11-01', end_date='2022-01-01'):
    # 代码转换
    if not code.endswith(('.XSHG', '.XSHE')):
        ticker, exchange  = code.split('.')
        code = ticker + '.' + {'SZ': 'XSHE', 'BJ': 'XSHE', 'SH': 'XSHG'}.get(exchange.upper())
    # 提取stock_kline中的1min数据
    query = "select * from jqdata.stock_kline where code='{}' and volume !=0.0  and trade_day >='{}' and trade_day <'{}' and toDayOfWeek(trade_day) < 6".format(code, start_date, end_date)
    data = client.execute(query)

    # 提取stock_dayline中的1d数据,主要是复权因子
    day_query = "select trade_day, pre_close, factor from jqdata.stock_dayline where code='{}' and volume !=0.0  and trade_day >='{}' and trade_day <'{}' and toDayOfWeek(trade_day) < 6".format(code, start_date, end_date)
    day_data = client.execute(day_query)
    # 获取复权后的昨收数据
    pre_close = {i[0]: np.round(i[1] / i[2] , 3) for i in  day_data}

    df = pd.DataFrame(data, columns=['code', 'dtype', 'trade_day', 'trade_time', 'open', 'close', 'low', 'high', 'volume', 'money', 'avg'])
    df = df.sort_values(by='trade_time').drop_duplicates(subset=['trade_time'], keep='first').reset_index(drop=True)
    # 补充昨收数据
    df['pre_close'] = df['trade_day'].apply(lambda x: pre_close.get(x, 0))
    df['trade_day'] = df['trade_day'].astype(str)

    # 当天收盘相对于昨收的百分比
    df['daily_percent'] = (df['close'] - df['pre_close']) / df['pre_close']
    df['minutes_of_day'] = df.trade_time.dt.hour * 60 + df.trade_time.dt.minute

    # 过滤尾盘集合竞价数据    
    df = df[~df.minutes_of_day.isin([898, 899])]

    # 此刻买进,日内剩余时间段内的收益25%分位数
    df['label_d0_25'] = ((df.groupby(['trade_day'])['avg'].transform(lambda row: row[::-1].shift(1).expanding(min_periods=1).quantile(0.25)[::-1]) - df['avg']) * 100).div(df['avg'])
    # 此刻买进,日内剩余时间段内的收益50%分位数
    df['label_d0_50'] = ((df.groupby(['trade_day'])['avg'].transform(lambda row: row[::-1].shift(1).expanding(min_periods=1).quantile(0.5)[::-1]) - df['avg']) * 100).div(df['avg'])
    # 此刻买进,日内剩余时间段内的收益75%分位数
    df['label_d0_75'] = ((df.groupby(['trade_day'])['avg'].transform(lambda row: row[::-1].shift(1).expanding(min_periods=1).quantile(0.75)[::-1]) - df['avg']) * 100).div(df['avg'])

    # 此刻买进,日内5分钟时间段内的收益50%分位具体值
    df['label_rolling_5_50_bps'] = (df.groupby(['trade_day'])['avg'].transform(lambda row: row[::-1].shift(1).rolling(5, min_periods=1).quantile(0.5)[::-1]) - df['avg']) * 100
    # 此刻买进,日内5分钟时间段内的收益50%分位数
    df['label_rolling_5_50'] = df['label_rolling_5_50_bps'].div(df['avg'])
    # 此刻买进,日内10分钟时间段内的收益50%分位具体值
    df['label_rolling_10_50_bps'] = df.groupby(['trade_day'])['avg'].transform(lambda row: row[::-1].shift(1).rolling(10, min_periods=1).quantile(0.5)[::-1]) - df['avg']
    # 此刻买进,日内10分钟时间段内的收益50%分位数
    df['label_rolling_10_50'] = df['label_rolling_10_50_bps'].div(df['avg'])

    return df.reset_index(drop=True)

读取日间K线数据(后复权)

日线有两个,一个是聚宽数据,主要针对国内市场,还有一个是华盛通数据,针对香港和美国市场,都从clickhouse数据库中读取。

# 日线: 华盛通
def load_hstong_dayline(market='cn', code='300369.SZ', start_date='2018-11-01', end_date='2022-01-01'):
    query = "select code, data_date, price_open, price_close, price_low, price_high, price_last, volume, deal  from hstong.dayline_hfq_{} where code='{}' and volume !=0.0  and data_date >='{}' and data_date <='{}' and toDayOfWeek(data_date) < 6".format(market, code, start_date, end_date)
    data = client.execute(query)
    df = pd.DataFrame(data, columns=['code', 'trade_day', 'open', 'close', 'low', 'high', 'pre_close', 'volume', 'money'])
    df = df.sort_values(by='trade_day').drop_duplicates(subset=['trade_day'], keep='first').reset_index(drop=True)
    return df

# 日线:聚宽
def load_jqdata_dayline(code='510050.SH', start_date='2018-11-01', end_date='2022-01-01'):
    if not code.endswith(('.XSHG', '.XSHE')):
        ticker, exchange  = code.split('.')
        code = ticker + '.' + {'SZ': 'XSHE', 'BJ': 'XSHE', 'SH': 'XSHG'}.get(exchange.upper())
    query = "select code, trade_day, open, close, low, high, pre_close, volume, money, factor  from jqdata.stock_dayline where code='{}' and volume !=0.0  and trade_day >='{}' and trade_day <'{}' and toDayOfWeek(trade_day) < 6".format(code, start_date, end_date)
    data = client.execute(query)
    df = pd.DataFrame(data, columns=['code', 'trade_day', 'open', 'close', 'low', 'high', 'pre_close', 'volume', 'money', 'factor'])
    df = df.sort_values(by='trade_day').drop_duplicates(subset=['trade_day'], keep='first').reset_index(drop=True)
    df['trade_day'] = df['trade_day'].astype(str)
    return df

我们以茅台(600519.SH)为例,读取'2020-01-01'至'2022-07-07'间的数据:

kline_df = load_jqdata_kline(code='600519.SH', start_date='2020-01-01', end_date='2022-07-07')
print(kline_df.iloc[-5:])
codedtypetrade_daytrade_timeopencloselowhighvolumemoneyavgpre_closedaily_percentminutes_of_daylabel_d0_25label_d0_50label_d0_75label_rolling_5_50_bpslabel_rolling_5_50label_rolling_10_50_bpslabel_rolling_10_50
600519.XSHGSTOCK2022-07-072022-07-07 14:54:00+08:001994.861993.661993.661994.86105002.09366e+071993.992002-0.00416583894-0.0850054-0.0308427-0.0132899-61.5-0.0308427-0.615-0.000308427
600519.XSHGSTOCK2022-07-072022-07-07 14:55:00+08:001993.581994.041993.571994.589001.77452e+071993.832002-0.00397602895-0.115356-0.0386191-0.0228204-77-0.0386191-0.77-0.000386191
600519.XSHGSTOCK2022-07-072022-07-07 14:56:00+08:001993.991993.681993.511993.99187003.72816e+071993.692002-0.00415584896-0.146713-0.108342-0.0699708-216-0.108342-2.16-0.00108342
600519.XSHGSTOCK2022-07-072022-07-07 14:57:00+08:001993.29199319931993.29159003.16902e+071993.062002-0.0044955897-0.153533-0.153533-0.153533-306-0.153533-3.06-0.00153533
600519.XSHGSTOCK2022-07-072022-07-07 15:00:00+08:001993.01199019901993.01478009.51379e+0719902002-0.00599401900nannannannannannannan

构建日内特征(分钟级)

读取K线数据后,便进行最重要的特征提取工作,同时生成对应的标签

def make_tech_feature_intraday_jqdata(kline_df, index_col='trade_time'):
    df_grouped = kline_df.set_index(index_col).groupby('trade_day', as_index=False)

    rsi_5 = (df_grouped.apply(lambda row: row.ta.rsi(length=5) / 100)).fillna(0.5).to_frame().reset_index(drop=True)
    cmo_3 =  (df_grouped.apply(lambda row: row.ta.cmo(length=3) / 100)).fillna(0.0).to_frame().reset_index(drop=True)
    natr_3 =  (df_grouped.apply(lambda row: row.ta.natr(length=3))).fillna(0.0).to_frame().reset_index(drop=True)
    rvi_5 =  (df_grouped.apply(lambda row: row.ta.rvi(length=5) / 100)).fillna(0.5).to_frame().reset_index(drop=True)
    try:
        kdj_9_3 = (df_grouped.apply(lambda row: row.ta.kdj(9, 3) / 100)).fillna(0.5).reset_index(drop=True)
    except:
        kdj_9_3 = (df_grouped.apply(lambda row: row.ta.kdj(min(9, len(row)), 3).T.reset_index(drop=True).T / 100)).fillna(0.5).reset_index(drop=True)
        kdj_9_3.columns = ['K_9_3', 'D_9_3', 'J_9_3']
    adx_5 = (df_grouped.apply(lambda row: row.ta.adx(length=5) / 100)).clip(-10, 10).fillna(0.0).reset_index(drop=True)
    try:
        mfi_5 = (df_grouped.apply(lambda row: row.ta.mfi(length=5) / 100)).clip(-10, 10).fillna(0.5).to_frame().reset_index(drop=True)
    except:
        mfi_5 = (df_grouped.apply(lambda row: row.ta.mfi(length=min(5, len(row))).rename('MFI_5') / 100)).clip(-10, 10).fillna(0.5).to_frame().reset_index(drop=True)
    pvi_5 = (df_grouped.apply(lambda row: row.ta.pvi(length=5) / 1000)).clip(-10, 10).fillna(1).to_frame().reset_index(drop=True)
    nvi_5 = (df_grouped.apply(lambda row: row.ta.nvi(length=5) / 1000)).clip(-10, 10).fillna(1).to_frame().reset_index(drop=True)
    willr_5 = (df_grouped.apply(lambda row: row.ta.willr(length=5) / 100)).clip(-1, 1).fillna(-0.5).to_frame().reset_index(drop=True)
    try:
        willr_10 = (df_grouped.apply(lambda row: row.ta.willr(length=10) / 100)).clip(-1, 1).fillna(-0.5).to_frame().reset_index(drop=True)
    except:
        willr_10 = (df_grouped.apply(lambda row: row.ta.willr(length=min(10, len(row))).rename('WILLR_10') / 100)).clip(-1, 1).fillna(-0.5).to_frame().reset_index(drop=True)
    try:
        cmf_10 = (df_grouped.apply(lambda row: row.ta.cmf(length=10) )).clip(-10, 10).fillna(0.0).to_frame().reset_index(drop=True)
    except:
        cmf_10 = (df_grouped.apply(lambda row: row.ta.cmf(length=min(10, len(row))).rename('CMF_10') )).clip(-10, 10).fillna(0.0).to_frame().reset_index(drop=True)
    dpo_5 = (df_grouped.apply(lambda row: row.ta.dpo(length=5, lookahead=False) * 10 )).clip(-10, 10).fillna(0.0).to_frame().reset_index(drop=True)
    log_return_5 = (df_grouped.apply(lambda row: row.ta.log_return(length=5) * 50 )).clip(-10, 10).fillna(0.0).to_frame().reset_index(drop=True) 
    zscore_5 = (df_grouped.apply(lambda row: row.ta.zscore(length=5) )).clip(-10, 10).fillna(0.0).to_frame().reset_index(drop=True) 
    pct_change_3 = (df_grouped.apply(lambda row: row['avg'].pct_change(periods=3).fillna(0) * 50 / 3)).clip(-10, 10).to_frame().rename(columns={'avg': 'pct_change_3'}).reset_index(drop=True) 
    pct_change_6 = (df_grouped.apply(lambda row: row['avg'].pct_change(periods=6).fillna(0) * 50 / 6)).clip(-10, 10).to_frame().rename(columns={'avg': 'pct_change_6'}).reset_index(drop=True) 
    
    rolling_std_5 = (df_grouped.apply(lambda row: row['avg'].rolling(5, min_periods=1).std(ddof=0))).clip(-10, 10).to_frame().rename(columns={'avg': 'rolling_std_5'}).reset_index(drop=True)
    rolling_money_5 = (df_grouped.apply(lambda row: np.log1p(row['money'].rolling(5, min_periods=1).mean()) - np.log1p(row['money']) ) / 10.0).clip(-10, 10).to_frame().rename(columns={'money': 'rolling_money_5'}).reset_index(drop=True)
    rolling_money_6_3 = (df_grouped.apply(lambda row: np.log1p(row['money'].rolling(6, min_periods=1).mean()) - np.log1p(row['money'].rolling(3, min_periods=1).mean()) ) / 10.0 ).clip(-10, 10).to_frame().rename(columns={'money': 'rolling_money_6_3'}).reset_index(drop=True)
    rolling_volume_5 = (df_grouped.apply(lambda row: np.log1p(row['volume'].rolling(5, min_periods=1).mean()) - np.log1p(row['volume']) )/ 10.0).clip(-10, 10).to_frame().rename(columns={'volume': 'rolling_volume_5'}).reset_index(drop=True)
    rolling_volume_6_3 = (df_grouped.apply(lambda row: np.log1p(row['volume'].rolling(6, min_periods=1).mean()) - np.log1p(row['volume'].rolling(3, min_periods=1).mean()) )/ 10.0).clip(-10, 10).to_frame().rename(columns={'volume': 'rolling_volume_6_3'}).reset_index(drop=True)
    pct_volatility = (df_grouped.apply(lambda row: (row['high'] - row['low']) * 50 / row['avg'])).clip(-10, 10).fillna(0.0).to_frame().rename(columns={0: 'pct_volatility'}).reset_index(drop=True) 
    rolling_pct_volatility_5 = (df_grouped.apply(lambda row: (row['high'].rolling(5, min_periods=1).max() - row['low'].rolling(5, min_periods=1).min())* 20 / row['avg'])).clip(-10, 10).fillna(0.0).to_frame().rename(columns={0: 'rolling_pct_volatility_5'}).reset_index(drop=True) 
    rolling_pct_volatility_10 = (df_grouped.apply(lambda row: (row['high'].rolling(10, min_periods=1).max() - row['low'].rolling(10, min_periods=1).min()) * 20 / row['avg'])).clip(-10, 10).fillna(0.0).to_frame().rename(columns={0: 'rolling_pct_volatility_10'}).reset_index(drop=True) 
    pct_vwap_low = (df_grouped.apply(lambda row: (row.ta.vwap() - row['low']) * 50 / row['avg'])).clip(-10, 10).fillna(0.0).to_frame().rename(columns={0: 'pct_vwap_low'}).reset_index(drop=True) 
    pct_vwap_high = (df_grouped.apply(lambda row: (row['high'] - row.ta.vwap()) * 50 / row['avg'])).clip(-10, 10).fillna(0.0).to_frame().rename(columns={0: 'pct_vwap_high'}).reset_index(drop=True) 
    
    feature = pd.concat([rsi_5, cmo_3, natr_3, rvi_5, kdj_9_3,  adx_5, mfi_5, pvi_5, nvi_5, willr_5, willr_10, cmf_10, dpo_5, log_return_5, zscore_5, pct_change_3, pct_change_6, rolling_std_5, rolling_money_5, rolling_money_6_3, rolling_volume_5, rolling_volume_6_3, pct_volatility, rolling_pct_volatility_5, rolling_pct_volatility_10, pct_vwap_low, pct_vwap_high, kline_df[['daily_percent']] * 50], axis=1) 
    feature.columns = ['intraday_%s' % i for i in feature.columns]

    index = kline_df[index_col].apply(lambda x: int(x.timestamp()))
    
    introday_data = pd.concat([kline_df[['code', 'trade_day', 'minutes_of_day']], feature], axis=1)
    introday_data.index = index
    

    if 'label_d0_25' in kline_df.columns:
        label_data = kline_df[['code', 'trade_day', 'minutes_of_day', 'label_d0_25', 'label_d0_50', 'label_d0_75',
           'label_rolling_5_50', 'label_rolling_10_50', 'label_rolling_5_50_bps', 'label_rolling_10_50_bps']]
        label_data.index = index
    else:
        label_data = []

    introday_data = introday_data[introday_data.minutes_of_day != 900]    
    label_data = label_data[label_data.minutes_of_day != 900]        
    return introday_data.reset_index(), label_data.reset_index()

同样的,我们展示上面茅台数据生成经过pandas-ta提取的特征:

introday_data, label_data = make_tech_feature_intraday_jqdata(kline_df, index_col='trade_time')
print(introday_data.iloc[-5:])
print(label_data.iloc[-5:])

  • 日内数据特征:
trade_timecodetrade_dayminutes_of_dayintraday_RSI_5intraday_CMO_3intraday_NATR_3intraday_RVI_5intraday_K_9_3intraday_D_9_3intraday_J_9_3intraday_ADX_5intraday_DMP_5intraday_DMN_5intraday_MFI_5intraday_PVI_5intraday_NVI_5intraday_WILLR_5intraday_WILLR_10intraday_CMF_10intraday_DPO_5intraday_LOGRET_5intraday_ZS_5intraday_pct_change_3intraday_pct_change_6intraday_rolling_std_5intraday_rolling_money_5intraday_rolling_money_6_3intraday_rolling_volume_5intraday_rolling_volume_6_3intraday_pct_volatilityintraday_rolling_pct_volatility_5intraday_rolling_pct_volatility_10intraday_pct_vwap_lowintraday_pct_vwap_highintraday_daily_percent
1657176780600519.XSHG2022-07-078930.4868-0.0641030.0778460.4726970.5366840.418180.7736930.4042890.1705890.0537660.5892310.9997011.00016-0.377273-0.4520.1970774.10.0310894-0.3814270.008022140.006896410.613697-0.010953-0.0104473-0.011178-0.01045750.035080.02205030.02505710.0622761-0.0271961-0.178072
1657176840600519.XSHG2022-07-078940.377828-0.4181720.07215960.3068060.3820320.4061310.3338350.335490.1440560.1276720.3537670.9996561.00016-0.971429-0.931624-0.0242737-10-0.0225665-1.63965-0.00860476-0.001671350.50677-0.0136592-0.006727-0.0136515-0.006781570.03009040.02106330.02347050.0780107-0.0479203-0.208292
1657176900600519.XSHG2022-07-078950.428086-0.2036590.06364360.5367570.3365060.3829230.2436740.2697360.1253270.1236550.1996390.9996561.00009-0.779343-0.7545450.0696795-7.7-0.0391013-0.75092-0.01027540.0006687830.6380410.00777374-0.008501690.00777846-0.008578280.02332190.02136590.02206810.0799701-0.0566481-0.198801
1657176960600519.XSHG2022-07-078960.390711-0.3445470.0512980.3446220.251610.3391520.07652710.2225810.1147050.1227690.1653020.9996111.00009-0.922374-0.9181820.0739293-10-0.0223156-0.902093-0.0146167-0.003300780.709123-0.0476034-0.0131696-0.0475763-0.01316470.0120380.02196930.02206960.0807789-0.0687409-0.207792
1657177020600519.XSHG2022-07-078970.323935-0.5634020.04558350.2430840.167740.282015-0.06080870.2428910.1009780.1978312.9012e-160.9996110.999951-1-1-0.143719-10-0.0674407-1.39213-0.00777336-0.008187050.785147-0.0210747-0.0162005-0.0210629-0.01618310.007275250.0270940.0270940.0928892-0.0856139-0.224775
  • 日内标签数据
trade_timecodetrade_dayminutes_of_daylabel_d0_25label_d0_50label_d0_75label_rolling_5_50label_rolling_10_50label_rolling_5_50_bpslabel_rolling_10_50_bps
1657176780600519.XSHG2022-07-07893-0.119272-0.0877-0.080684-0.0877-0.000877-175-1.75
1657176840600519.XSHG2022-07-07894-0.0850054-0.0308427-0.0132899-0.0308427-0.000308427-61.5-0.615
1657176900600519.XSHG2022-07-07895-0.115356-0.0386191-0.0228204-0.0386191-0.000386191-77-0.77
1657176960600519.XSHG2022-07-07896-0.146713-0.108342-0.0699708-0.108342-0.00108342-216-2.16
1657177020600519.XSHG2022-07-07897-0.153533-0.153533-0.153533-0.153533-0.00153533-306-3.06

简单计算技术因子与日内收益的相关性:

corr_data = []
for col in introday_data.columns.tolist()[4:]:
    feature = introday_data[col]
    label_d0_50 = feature.corr(label_data['label_d0_50'])
    label_rolling_5_50 = feature.corr(label_data['label_rolling_5_50'])
    label_rolling_10_50 = feature.corr(label_data['label_rolling_10_50'])
    corr_data.append([col, label_d0_50, label_rolling_5_50, label_rolling_10_50])
corr_data = pd.DataFrame(corr_data, columns=['feature', 'd0_50', 'rolling_5_50', 'rolling_10_50'])
print(corr_data)
featured0_50rolling_5_50rolling_10_50
intraday_RSI_50.04028280.1343210.103664
intraday_CMO_30.04607170.1635930.124605
intraday_NATR_3-0.00444380.01843110.0159059
intraday_RVI_50.03841550.1396060.106858
intraday_K_9_30.02818010.06536670.0561566
intraday_D_9_30.0193750.03229490.0315525
intraday_J_9_30.03057510.08999860.0733925
intraday_ADX_50.001783530.00119938-0.00362684
intraday_DMP_50.0231130.06936020.0517322
intraday_DMN_5-0.0211941-0.0660362-0.0478488
intraday_MFI_50.01878320.05738290.0454739
intraday_PVI_5-0.006311780.004956450.0021088
intraday_NVI_5-0.01387210.0016334-0.00108481
intraday_WILLR_50.05089930.1749070.135297
intraday_WILLR_100.04377110.1333090.106649
intraday_CMF_100.03100670.07203050.060817
intraday_DPO_50.03220190.1012030.0789424
intraday_LOGRET_50.02826540.1013850.0758353
intraday_ZS_50.0481930.1892380.140897
intraday_pct_change_30.01114020.03163970.0211543
intraday_pct_change_60.009227180.01993520.0154407
intraday_rolling_std_5-0.009295290.02460640.0264212
intraday_rolling_money_5-0.00175834-0.0044877-0.00455773
intraday_rolling_money_6_3-0.00369373-0.00865403-0.009164
intraday_rolling_volume_5-0.00169721-0.00431745-0.00442322
intraday_rolling_volume_6_3-0.00365949-0.00860967-0.00912756
intraday_pct_volatility0.009603360.03960890.0383
intraday_rolling_pct_volatility_50.01182250.03953670.0425203
intraday_rolling_pct_volatility_100.008275560.03353090.0334182
intraday_pct_vwap_low0.0097716-0.00987224-0.00514639
intraday_pct_vwap_high-0.008287360.0158940.0109779
intraday_daily_percent0.04733940.0257580.0216949

可以发现,5min/10min的相关性还是有一些的,说明技术指标短期内可以起到一些预测作用,这也给我们做日内高频可转债提供了一个思路。

特征转存数据库

这里以日内特征为例,构建数据表,应用上述特征抽取函数,提取技术面特征,然后存储到Clickhouse数据库中

import pandas as pd
from tqdm import tqdm
import numpy as np
from clickhouse_driver import Client

storage_client = Client('10.0.16.11', password='******', settings={'use_numpy': True})

def make_cn_intraday_set(start_date='2018-01-01', end_date='2022-07-01'):
    '''
    主要股票:日内特征
    '''
    create_table_intraday_cn_data = '''
        create table if not exists feature.intraday_cn_data
        (
            trade_time DateTime('Asia/Shanghai'), code String, trade_day Date, minutes_of_day Int16,
            intraday_RSI_5 Float32, intraday_CMO_3 Float32, intraday_NATR_3 Float32, intraday_RVI_5 Float32, intraday_K_9_3 Float32, intraday_D_9_3 Float32, intraday_J_9_3 Float32, intraday_ADX_5 Float32, intraday_DMP_5 Float32, intraday_DMN_5 Float32, intraday_MFI_5 Float32, intraday_PVI_5 Float32, intraday_NVI_5 Float32, intraday_WILLR_5 Float32, intraday_WILLR_10 Float32, intraday_CMF_10 Float32, intraday_DPO_5 Float32, intraday_LOGRET_5 Float32, intraday_ZS_5 Float32, intraday_pct_change_3 Float32, intraday_pct_change_6 Float32, intraday_rolling_std_5 Float32, intraday_rolling_money_5 Float32, intraday_rolling_money_6_3 Float32, intraday_rolling_volume_5 Float32, intraday_rolling_volume_6_3 Float32, intraday_pct_volatility Float32, intraday_rolling_pct_volatility_5 Float32, intraday_rolling_pct_volatility_10 Float32, intraday_pct_vwap_low Float32, intraday_pct_vwap_high Float32, intraday_daily_percent Float32
        )
        ENGINE = ReplacingMergeTree()
        ORDER BY (trade_day, trade_time, code)
    '''
    storage_client.execute(create_table_intraday_cn_data)

    create_table_intraday_cn_label = '''
        create table if not exists feature.intraday_cn_label
        (
            trade_time DateTime('Asia/Shanghai'), code String, trade_day Date, minutes_of_day Int16,
            label_d0_25 Float32, label_d0_50 Float32, label_d0_75 Float32,  label_rolling_5_50 Float32,  label_rolling_10_50 Float32,  label_rolling_5_50_bps Float32,  label_rolling_10_50_bps Float32
        )
        ENGINE = ReplacingMergeTree()
        ORDER BY (trade_day, trade_time, code)
    '''
    storage_client.execute(create_table_intraday_cn_label)

    def make(code):
        # 读取K线
        kline_df = load_jqdata_kline(code=code, start_date=start_date, end_date=end_date)
        if len(kline_df) < 1:
            return
        # 生成特征和标签
        introday_data, label_data = make_tech_feature_intraday_jqdata(kline_df)
        # 分批写入,降低性能压力
        if len(introday_data) > 0:
            for i in range(int(np.ceil(len(introday_data) / 2000))):
                storage_client.insert_dataframe('INSERT INTO feature.intraday_cn_data  VALUES', introday_data.iloc[i * 2000: (i+1)*2000])
                storage_client.insert_dataframe('INSERT INTO feature.intraday_cn_label VALUES', label_data.iloc[i * 2000: (i+1)*2000])
    
    # 获取指定的股票列表
    tickers = get_jq_ticker(start_date, end_date)

    # 遍历股票,生成特征
    for ticker in tqdm(tickers): 
            make(ticker)

至此,我们完成了技术特征数据的提取、存储工作。


欢迎关注我的公众号“量化实战”,原创技术文章第一时间推送。

qrcode.jpg