pandas - Python correlation matrix 3d dataframe -
i have in sql server historical return table date , asset id this:
[date] [asset] [1dret] jan asset1 0.52 jan asset2 0.12 jan asset3 0.07 feb asset1 0.41 feb asset2 0.33 feb asset3 0.21
...
so need calculate correlation matrix given date range assets combinations: a1,a2 ; a1,a3 ; a2,a3
im using pandas , in sql select i'm filtering tha date range , ordering date.
i'm trying using pandas df.corr(), numpy.corrcoef , scipy not able n-variable dataframe
i see example it's dataframe have asset per column , 1 row per day.
this code block i'm doing it:
qryret = "select * indexesvalue date > '20100901' , date < '20150901' order date" result = conn.execute(qryret) df = pd.dataframe(data=list(result),columns=result.keys()) df1d = df[['date','id_riskfactor','1dreturn']] corr = df1d.set_index(['date','id_riskfactor']).unstack().corr() corr.columns = corr.columns.droplevel() corr.index = corr.columns.tolist() corr.index.name = 'symbol_1' corr.columns.name = 'symbol_2' print(corr) conn.close()
for i'm reciving msg:
corr.columns = corr.columns.droplevel() attributeerror: 'index' object has no attribute 'droplevel' **print(df1d.head())** date id_riskfactor 1dreturn 0 2010-09-02 149 0e-12 1 2010-09-02 150 -0.004242875148 2 2010-09-02 33 0.000590000011 3 2010-09-02 28 0.000099999997 4 2010-09-02 34 -0.000010000000 **print(df.head())** date id_riskfactor value 1dreturn 5dreturn 0 2010-09-02 149 0.040096000000 0e-12 0e-12 1 2010-09-02 150 1.736700000000 -0.004242875148 -0.013014321215 2 2010-09-02 33 2.283000000000 0.000590000011 0.001260000048 3 2010-09-02 28 2.113000000000 0.000099999997 0.000469999999 4 2010-09-02 34 0.615000000000 -0.000010000000 0.000079999998 **print(corr.columns)** index([], dtype='object')
create sample dataframe:
import pandas pd import numpy np df = pd.dataframe({'daily_return': np.random.random(15), 'symbol': ['a'] * 5 + ['b'] * 5 + ['c'] * 5, 'date': np.tile(pd.date_range('1-1-2015', periods=5), 3)}) >>> df daily_return date symbol 0 0.011467 2015-01-01 1 0.613518 2015-01-02 2 0.334343 2015-01-03 3 0.371809 2015-01-04 4 0.169016 2015-01-05 5 0.431729 2015-01-01 b 6 0.474905 2015-01-02 b 7 0.372366 2015-01-03 b 8 0.801619 2015-01-04 b 9 0.505487 2015-01-05 b 10 0.946504 2015-01-01 c 11 0.337204 2015-01-02 c 12 0.798704 2015-01-03 c 13 0.311597 2015-01-04 c 14 0.545215 2015-01-05 c
i'll assume you've filtered dataframe relevant dates. want pivot table have unique dates index , symbols separate columns, daily returns values. finally, call corr()
on result.
corr = df.set_index(['date','symbol']).unstack().corr() corr.columns = corr.columns.droplevel() corr.index = corr.columns.tolist() corr.index.name = 'symbol_1' corr.columns.name = 'symbol_2' >>> corr symbol_2 b c symbol_1 1.000000 0.188065 -0.745115 b 0.188065 1.000000 -0.688808 c -0.745115 -0.688808 1.000000
you can select subset of dataframe based on dates follows:
start_date = pd.timestamp('2015-1-4') end_date = pd.timestamp('2015-1-5') >>> df.loc[df.date.between(start_date, end_date), :] daily_return date symbol 3 0.371809 2015-01-04 4 0.169016 2015-01-05 8 0.801619 2015-01-04 b 9 0.505487 2015-01-05 b 13 0.311597 2015-01-04 c 14 0.545215 2015-01-05 c
if want flatten correlation matrix:
corr.stack().reset_index() symbol_1 symbol_2 0 0 1.000000 1 b 0.188065 2 c -0.745115 3 b 0.188065 4 b b 1.000000 5 b c -0.688808 6 c -0.745115 7 c b -0.688808 8 c c 1.000000
Comments
Post a Comment