多因子专题(一)-- 截面回归 附源代码

367 阅读9分钟

通常在Barra 模型中,因子暴露直接来自基本面或者技术面数据本身,而在学术界通常使用时序回归得到因子暴露,在进行截面回归得到因子收益率。

截面回归

1.先对时序时间求个股收益均值

In [13]:

start_date='2018-01-10'
end_date='2019-01-10'

start_date=datetime.datetime.strptime(start_date,'%Y-%m-%d')
end_date=datetime.datetime.strptime(end_date,'%Y-%m-%d')+datetime.timedelta(days=1)

stock_list2=list(stock_params.index)
price = get_price(stock_list2, start_date,end_date, '1d', ['quote_rate'], True, None, is_panel=1)['quote_rate']
return_df=price.apply(lambda x:x.mean(),axis=0) 
return_df.head()  

Out[13]:

000001.SZ   -0.072040
000002.SZ   -0.091996
000063.SZ   -0.196550
000069.SZ   -0.129263
000100.SZ   -0.125031
dtype: float64

In [14]:

from statsmodels import regression
import statsmodels.api as sm

stock_params['return_ave_stock']=return_df

x=np.array(stock_params['coef'])
x = x.astype(float)
y=np.array(stock_params['return_ave_stock'])
y = y.astype(float)
#X = sm.add_constant(x) 

est=sm.OLS(y,x)
est=est.fit()
est.summary()

Out[14]:

Dep. Variable:yR-squared:0.114
Model:OLSAdj. R-squared:0.109
Method:Least SquaresF-statistic:27.30
Date:Wed, 20 Feb 2019Prob (F-statistic):4.15e-07
Time:09:58:20Log-Likelihood:74.090
No. Observations:214AIC:-146.2
Df Residuals:213BIC:-142.8
Df Model:1
Covariance Type:nonrobust

| | coef | std err | t | P>|t| | [0.025 | 0.975] | | -- | ------- | ------- | ------ | ------- | ------ | ------ | | x1 | -0.0874 | 0.017 | -5.224 | 0.000 | -0.120 | -0.054 |

Omnibus:141.322Durbin-Watson:1.507
Prob(Omnibus):0.000Jarque-Bera (JB):2215.624
Skew:-2.231Prob(JB):0.00
Kurtosis:18.118Cond. No.1.00

因子收益率

In [15]:

second_params2 = pd.DataFrame(columns = ["coef"],index=['factor'])

x=stock_params['coef'].reshape(-1,1)
y=stock_params['return_ave_stock'].reshape(-1,1)
  
    
model = LinearRegression(fit_intercept=None)
model.fit(x,y)

second_params2.ix[0,'coef']=model.coef_[0][0]
second_params2

Out[15]:

coef
factor-0.0874333

2.残差的概率密度分布

In [16]:

pd.Series(est.resid).plot.density(figsize = (20, 8))
plt.title('resid_density',size=25)
#fig = plt.figure(figsize = (20, 8))
#ax = est.resid.plot.kde(label = 'Beta')
#ax.set_title('Beta',size=25)
#ax.legend()    

Out[16]:

<matplotlib.text.Text at 0x7fa6bc2d3630>

转存失败,建议直接上传图片文件

3. 对残差进行t检验

In [17]:

from scipy import stats
stat_val, p_value = stats.ttest_1samp(est.resid, 0)
p_value

Out[17]:

1.5905611338293164e-14

In [18]:

stat_val

Out[18]:

-8.254519611155741

In [19]:

ttest_df1= pd.DataFrame(columns = ["stats_val","p_value"],index=['factor_residual'])
ttest_df1.ix[0,"stats_val"]=stat_val
ttest_df1.ix[0,"p_value"]=p_value
ttest_df1

Out[19]:

stats_valp_value
factor_residual-8.254521.59056e-14

截面回归下三因子模型

1.第一步,时序回归

In [20]:

flag1=True
while start_date<=end_date: 
    next_date=start_date+datetime.timedelta(days=1)
    #print(next_date)
    
    #取全市场股票各股票市值因子
    df = get_fundamentals(query(asharevalue.symbol, asharevalue.total_mv).filter(\
                        asharevalue.symbol.in_(stock_list1)).order_by(asharevalue.total_mv.asc()),date=start_date)
    df.set_index('asharevalue_symbol',inplace=True)
    df.index.name='symbol'
    
    df2=get_fundamentals(query(asharevalue.symbol,balance.total_equity/asharevalue.total_mv).filter(\
                        asharevalue.symbol.in_(stock_list1)),date=start_date)
    df2.set_index('asharevalue_symbol',inplace=True)
    df2.index.name='symbol'
    df2.columns=['BM']
    df2.sort_values(by=['BM'],inplace=True)  
    
    stocklist_cap_top=list(df[:int(len(df)*0.3)].index)
    stocklist_cap_bottom=list(df[int(len(df)*0.7):].index)
    
    stocklist_BM_bottom=list(df2[:int(len(df)*0.3)].index)
    stocklist_BM_top=list(df2[int(len(df)*0.7):].index)
    
    
   
    try:
        value_top = get_price(stocklist_cap_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top=value_top.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom = get_price(stocklist_cap_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom=value_bottom.apply(lambda x:x.mean(),axis=1) 
        return_factor=return_sum_top-return_sum_bottom
        
        
        
        value_top_BM = get_price(stocklist_BM_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top_BM=value_top_BM.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom_BM = get_price(stocklist_BM_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom_BM=value_bottom_BM.apply(lambda x:x.mean(),axis=1) 
        return_factor_BM=return_sum_top_BM-return_sum_bottom_BM
        
    except:
        
        start_date=start_date+datetime.timedelta(days=1)
        continue
  
    if flag1:
        flag1=False
        return_date_total_factor = return_factor
        return_date_total_factor_BM = return_factor_BM
   
    else:
        return_date_total_factor = return_date_total_factor.append(return_factor)
        return_date_total_factor_BM = return_date_total_factor_BM.append(return_factor_BM)
        
    start_date=start_date+datetime.timedelta(days=1)

    
start_date_market='2018-01-10'
end_date_market='2019-01-10'

start_date=datetime.datetime.strptime(start_date_market,'%Y-%m-%d')
end_date=datetime.datetime.strptime(end_date_market,'%Y-%m-%d')+datetime.timedelta(days=1)
return_market= get_price('000300.SH', start_date_market,end_date_market, '1d', ['quote_rate'], True, None, is_panel=1)['quote_rate']   


return_date_total_factor_market=pd.DataFrame(return_market)    
return_date_total_factor=pd.DataFrame(return_date_total_factor)    
return_date_total_factor_BM=pd.DataFrame(return_date_total_factor_BM)

In [21]:

return_date_total_factor_df= pd.concat([return_date_total_factor_market,return_date_total_factor,return_date_total_factor_BM],axis=1)

return_date_total_factor_df.columns=['market_factor_return','cap_factor_return','BM_factor_return']
return_date_total_factor_df.fillna(0,inplace=True)
return_date_total_factor_df.head()

Out[21]:

market_factor_returncap_factor_returnBM_factor_return
2018-01-100.4420-1.2013220.040815
2018-01-11-0.05290.445029-0.470802
2018-01-120.4616-0.906181-0.052597
2018-01-150.0056-2.7117590.527817
2018-01-160.7866-0.3149211.009860

In [22]:

stock_params = pd.DataFrame(columns = ["intercept", "coef_market","coef_cap","coef_BM"],index=stock_list)

i=0

for stock in price.columns:  
    
    x=return_date_total_factor_df.ix[:,('market_factor_return','cap_factor_return','BM_factor_return')]
    y=price[stock].reshape(-1,1)
    
    model = LinearRegression()
    
    try:
        model.fit(x,y)
    
        stock_params.ix[i,'intercept']  = model.intercept_[0]
        
        stock_params.ix[i,'coef_market'] = model.coef_[0][0]
       
        stock_params.ix[i,'coef_cap'] = model.coef_[0][1]
        stock_params.ix[i,'coef_BM'] = model.coef_[0][2]
        
    except:
        stock_params.ix[i,'intercept'] =np.nan
        stock_params.ix[i,'coef_market'] =np.nan
        stock_params.ix[i,'coef_cap'] = np.nan
        stock_params.ix[i,'coef_BM'] = np.nan
    
    i=i+1
    
stock_params.dropna(inplace=True)    
stock_params.head()

Out[22]:

interceptcoef_marketcoef_capcoef_BM
601088.SH0.06426321.31645-0.4226330.640151
002624.SZ0.06350111.5009-0.4305190.612894
600583.SH0.04982741.256570.07524620.415214
600383.SH0.08425461.020560.6546380.151889
601633.SH0.07372990.7317080.4061550.151578

In [23]:

plt.figure(figsize=(20,16))
plt.subplot(311)
stock_params['coef_market'].plot.kde(label = 'Beta_market',color='g')
plt.title('Beta_market',size=25)

plt.subplot(312)
stock_params['coef_cap'].plot.kde(label = 'Beta_cap',color='r')
plt.title('Beta_cap',size=25)

plt.subplot(313)
stock_params['coef_BM'].plot.kde(label = 'Beta_BM')
plt.title('Beta_BM',size=25)

Out[23]:

<matplotlib.text.Text at 0x7fa6b8478d30>

转存失败,建议直接上传图片文件

2.第二步,截面回归

In [24]:

stock_params['return_ave_stock']=return_df
stock_params.dropna(inplace=True)

x=stock_params.ix[:,('coef_market','coef_cap','coef_BM')]
x = x.astype(float)
y=np.array(stock_params['return_ave_stock'])
y = y.astype(float)
#X = sm.add_constant(x) 

est=sm.OLS(y,x)
est=est.fit()
est.summary()

Out[24]:

Dep. Variable:yR-squared:0.419
Model:OLSAdj. R-squared:0.408
Method:Least SquaresF-statistic:37.56
Date:Wed, 20 Feb 2019Prob (F-statistic):2.51e-18
Time:10:01:41Log-Likelihood:82.155
No. Observations:159AIC:-158.3
Df Residuals:156BIC:-149.1
Df Model:3
Covariance Type:nonrobust

| | coef | std err | t | P>|t| | [0.025 | 0.975] | | ----------- | ------- | ------- | ------ | ------- | ------ | ------ | | coef_market | -0.1139 | 0.013 | -8.889 | 0.000 | -0.139 | -0.089 | | coef_cap | -0.0192 | 0.021 | -0.902 | 0.368 | -0.061 | 0.023 | | coef_BM | 0.0332 | 0.022 | 1.511 | 0.133 | -0.010 | 0.077 |

Omnibus:137.536Durbin-Watson:2.075
Prob(Omnibus):0.000Jarque-Bera (JB):2463.668
Skew:-2.982Prob(JB):0.00
Kurtosis:21.339Cond. No.2.15

因子收益率

In [25]:

second_params3 = pd.DataFrame(columns = ["coef_market","coef_cap","coef_BM"],index=['factor_return'])
x=stock_params.ix[:,('coef_market','coef_cap','coef_BM')]
y=stock_params['return_ave_stock'].reshape(-1,1)
  
    
model = LinearRegression(fit_intercept=None)
model.fit(x,y)

second_params3.ix[0,'coef_market']=model.coef_[0][0]
second_params3.ix[0,'coef_cap']=model.coef_[0][1]
second_params3.ix[0,'coef_BM']=model.coef_[0][2]
second_params3

Out[25]:

coef_marketcoef_capcoef_BM
factor_return-0.113863-0.0192380.0332294

In [26]:

pd.Series(est.resid).plot.density(figsize = (20, 8))
plt.title('resid_density_3factor',size=25)

Out[26]:

<matplotlib.text.Text at 0x7fa6b83849e8>

转存失败,建议直接上传图片文件

In [28]:

stat_val, p_value = stats.ttest_1samp(est.resid, 0)
ttest_df2= pd.DataFrame(columns = ["stats_val","p_value"],index=['factor_residual'])
ttest_df2.ix[0,"stats_val"]=stat_val
ttest_df2.ix[0,"p_value"]=p_value
ttest_df2

Out[28]:

stats_valp_value
factor_residual-0.2346370.814794

截面回归下五因子模型

In [36]:

flag1=True
while start_date<=end_date: 
    next_date=start_date+datetime.timedelta(days=1)
    #print(next_date)
    
    #取全市场股票各股票市值因子
    df = get_fundamentals(query(asharevalue.symbol, asharevalue.total_mv).filter(\
                        asharevalue.symbol.in_(stock_list1)).order_by(asharevalue.total_mv.asc()),date=start_date)
    df.set_index('asharevalue_symbol',inplace=True)
    df.index.name='symbol'
    
    df2=get_fundamentals(query(asharevalue.symbol,balance.total_equity/asharevalue.total_mv).filter(\
                        asharevalue.symbol.in_(stock_list1)),date=start_date)
    df2.set_index('asharevalue_symbol',inplace=True)
    df2.index.name='symbol'
    df2.columns=['BM']
    df2.sort_values(by=['BM'],inplace=True)  
    
    df3 = get_fundamentals(query(asharevalue.symbol, ashareprofit.roe_ttm).filter(\
                        asharevalue.symbol.in_(stock_list1)).order_by(ashareprofit.roe_ttm.desc()),date=start_date)
    df3.set_index('asharevalue_symbol',inplace=True)
    df3.index.name='symbol'
    
    df4 = get_fundamentals(query(asharevalue.symbol, asharevalue.pe_ttm).filter(\
                        asharevalue.symbol.in_(stock_list1)).order_by(asharevalue.pe_ttm.asc()),date=start_date)
    df4.set_index('asharevalue_symbol',inplace=True)
    df4.index.name='symbol'
    
    
    
    stocklist_cap_top=list(df[:int(len(df)*0.3)].index)
    stocklist_cap_bottom=list(df[int(len(df)*0.7):].index)
    
    stocklist_BM_bottom=list(df2[:int(len(df)*0.3)].index)
    stocklist_BM_top=list(df2[int(len(df)*0.7):].index)
    
    stocklist_roe_top=list(df3[:int(len(df)*0.3)].index)
    stocklist_roe_bottom=list(df3[int(len(df)*0.7):].index)
    
    stocklist_invest_top=list(df4[:int(len(df)*0.3)].index)
    stocklist_invest_bottom=list(df4[int(len(df)*0.7):].index)
    
   
    try:
        #cap因子收益
        
        value_top = get_price(stocklist_cap_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top=value_top.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom = get_price(stocklist_cap_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom=value_bottom.apply(lambda x:x.mean(),axis=1) 
        return_factor=return_sum_top-return_sum_bottom
        
        #BM因子收益
        
        value_top_BM = get_price(stocklist_BM_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top_BM=value_top_BM.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom_BM = get_price(stocklist_BM_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom_BM=value_bottom_BM.apply(lambda x:x.mean(),axis=1) 
        return_factor_BM=return_sum_top_BM-return_sum_bottom_BM
        
        #roe因子收益
        
        value_top_roe = get_price(stocklist_roe_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top_roe=value_top_roe.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom_roe = get_price(stocklist_roe_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom_roe=value_bottom_roe.apply(lambda x:x.mean(),axis=1) 
        return_factor_roe=return_sum_top_roe-return_sum_bottom_roe
        
        #投资效率因子
        
        value_top_invest = get_price(stocklist_invest_top,None, start_date, '1d', \
                              ['quote_rate'],True, None,1,is_panel=1)['quote_rate']
        return_sum_top_invest=value_top_invest.apply(lambda x:x.mean(),axis=1)    
        
        value_bottom_invest = get_price(stocklist_invest_bottom,None, start_date, '1d', \
                                 ['quote_rate'], True, None,1,is_panel=1)['quote_rate']
        return_sum_bottom_invest=value_bottom_invest.apply(lambda x:x.mean(),axis=1) 
        return_factor_invest=return_sum_top_invest-return_sum_bottom_invest
        
        
    except:
        
        start_date=start_date+datetime.timedelta(days=1)
        continue
  
    if flag1:
        flag1=False
        return_date_total_factor = return_factor
        return_date_total_factor_BM = return_factor_BM
        return_date_total_factor_roe = return_factor_roe
        return_date_total_factor_invest = return_factor_invest
   
    else:
        return_date_total_factor = return_date_total_factor.append(return_factor)
        return_date_total_factor_BM = return_date_total_factor_BM.append(return_factor_BM)
        return_date_total_factor_roe = return_date_total_factor_roe.append(return_factor_roe)
        return_date_total_factor_invest = return_date_total_factor_invest.append(return_factor_invest)
        
    start_date=start_date+datetime.timedelta(days=1)

    
start_date_market='2018-01-10'
end_date_market='2019-01-10'

start_date=datetime.datetime.strptime(start_date_market,'%Y-%m-%d')
end_date=datetime.datetime.strptime(end_date_market,'%Y-%m-%d')+datetime.timedelta(days=1)
return_market= get_price('000300.SH', start_date_market,end_date_market, '1d', ['quote_rate'], True, None, is_panel=1)['quote_rate']   


return_date_total_factor_market=pd.DataFrame(return_market)    
return_date_total_factor=pd.DataFrame(return_date_total_factor)    
return_date_total_factor_BM=pd.DataFrame(return_date_total_factor_BM)
return_date_total_factor_roe=pd.DataFrame(return_date_total_factor_roe)
return_date_total_factor_invest=pd.DataFrame(return_date_total_factor_invest)

In [37]:

return_date_total_factor_df= pd.concat([return_date_total_factor_market,return_date_total_factor,return_date_total_factor_BM,return_date_total_factor_roe,return_date_total_factor_invest],axis=1)

return_date_total_factor_df.columns=['market_factor_return','cap_factor_return','BM_factor_return','roe_factor_return','invest_factor_return']
return_date_total_factor_df.fillna(0,inplace=True)
return_date_total_factor_df.head()

Out[37]:

market_factor_returncap_factor_returnBM_factor_returnroe_factor_returninvest_factor_return
2018-01-100.4420-1.2013220.0408150.5106920.761227
2018-01-11-0.05290.445029-0.470802-0.749758-0.708931
2018-01-120.4616-0.906181-0.0525970.7137790.727008
2018-01-150.0056-2.7117590.5278171.1804341.751605
2018-01-160.7866-0.3149211.009860-0.7236690.532430

In [38]:

stock_params = pd.DataFrame(columns = ["intercept", "coef_market","coef_cap","coef_BM","coef_roe","coef_invest"],index=stock_list)

i=0

for stock in price.columns:  
    
    x=return_date_total_factor_df.ix[:,('market_factor_return','cap_factor_return','BM_factor_return','roe_factor_return','invest_factor_return')]
    y=price[stock].reshape(-1,1)
    
    model = LinearRegression()
    
    try:
        model.fit(x,y)
    
        stock_params.ix[i,'intercept']  = model.intercept_[0]
        
        stock_params.ix[i,'coef_market'] = model.coef_[0][0]
       
        stock_params.ix[i,'coef_cap'] = model.coef_[0][1]
        stock_params.ix[i,'coef_BM'] = model.coef_[0][2]
        stock_params.ix[i,'coef_roe'] = model.coef_[0][3]
        stock_params.ix[i,'coef_invest'] = model.coef_[0][4]
        
        
    except:
        stock_params.ix[i,'intercept'] =np.nan
        stock_params.ix[i,'coef_market'] =np.nan
        stock_params.ix[i,'coef_cap'] = np.nan
        stock_params.ix[i,'coef_BM'] = np.nan
        stock_params.ix[i,'coef_roe'] = np.nan
        stock_params.ix[i,'coef_invest'] = np.nan
        
    
    i=i+1
    
stock_params.dropna(inplace=True)    
stock_params.head()

Out[38]:

interceptcoef_marketcoef_capcoef_BMcoef_roecoef_invest
601088.SH0.05573431.31861-0.46870.635007-0.0442248-0.032073
002624.SZ0.1625431.47490.1204370.7411340.581480.310008
600583.SH0.1735521.215080.9230751.258121.40324-0.23489
600383.SH0.07301381.025840.550943-0.0387876-0.2406040.125318
601633.SH0.06704190.7279270.467020.5625130.376727-0.403318

In [39]:

stock_params['return_ave_stock']=return_df
stock_params.dropna(inplace=True)

x=stock_params.ix[:,('coef_market','coef_cap','coef_BM','coef_roe','coef_invest')]
x = x.astype(float)
y=np.array(stock_params['return_ave_stock'])
y = y.astype(float)
#X = sm.add_constant(x) 

est=sm.OLS(y,x)
est=est.fit()
est.summary()

Out[39]:

Dep. Variable:yR-squared:0.420
Model:OLSAdj. R-squared:0.401
Method:Least SquaresF-statistic:22.29
Date:Wed, 20 Feb 2019Prob (F-statistic):9.04e-17
Time:10:29:01Log-Likelihood:82.217
No. Observations:159AIC:-154.4
Df Residuals:154BIC:-139.1
Df Model:5
Covariance Type:nonrobust

| | coef | std err | t | P>|t| | [0.025 | 0.975] | | ----------- | ------- | ------- | ------ | ------- | ------ | ------ | | coef_market | -0.1141 | 0.013 | -8.496 | 0.000 | -0.141 | -0.088 | | coef_cap | -0.0219 | 0.025 | -0.862 | 0.390 | -0.072 | 0.028 | | coef_BM | 0.0348 | 0.023 | 1.504 | 0.135 | -0.011 | 0.080 | | coef_roe | -0.0095 | 0.022 | -0.436 | 0.663 | -0.053 | 0.034 | | coef_invest | 0.0194 | 0.023 | 0.861 | 0.391 | -0.025 | 0.064 |

Omnibus:137.364Durbin-Watson:2.067
Prob(Omnibus):0.000Jarque-Bera (JB):2452.053
Skew:-2.977Prob(JB):0.00
Kurtosis:21.294Cond. No.2.90

In [40]:

second_params3 = pd.DataFrame(columns = ["coef_market","coef_cap","coef_BM","coef_roe","coef_invest"],index=['factor_return'])
x=stock_params.ix[:,('coef_market','coef_cap','coef_BM',"coef_roe","coef_invest")]
y=stock_params['return_ave_stock'].reshape(-1,1)
  
    
model = LinearRegression(fit_intercept=None)
model.fit(x,y)

second_params3.ix[0,'coef_market']=model.coef_[0][0]
second_params3.ix[0,'coef_cap']=model.coef_[0][1]
second_params3.ix[0,'coef_BM']=model.coef_[0][2]
second_params3.ix[0,'coef_roe']=model.coef_[0][3]
second_params3.ix[0,'coef_invest']=model.coef_[0][4]
second_params3

Out[40]:

coef_marketcoef_capcoef_BMcoef_roecoef_invest
factor_return-0.114142-0.0219430.0347699-0.009507220.019449

In [ ]:

 

查看以上策略详情请到supermind量化交易官网查看:多因子专题(一)-- 截面回归 附源代码