pandas - Python correlation matrix 3d dataframe -


i have in sql server historical return table date , asset id this:

[date] [asset] [1dret] jan   asset1   0.52 jan   asset2   0.12 jan   asset3   0.07 feb   asset1   0.41 feb   asset2   0.33 feb   asset3   0.21 

...

so need calculate correlation matrix given date range assets combinations: a1,a2 ; a1,a3 ; a2,a3

im using pandas , in sql select i'm filtering tha date range , ordering date.

i'm trying using pandas df.corr(), numpy.corrcoef , scipy not able n-variable dataframe

i see example it's dataframe have asset per column , 1 row per day.

this code block i'm doing it:

qryret = "select * indexesvalue date > '20100901' , date < '20150901' order date"  result = conn.execute(qryret)  df = pd.dataframe(data=list(result),columns=result.keys())  df1d = df[['date','id_riskfactor','1dreturn']]  corr = df1d.set_index(['date','id_riskfactor']).unstack().corr() corr.columns = corr.columns.droplevel() corr.index = corr.columns.tolist() corr.index.name = 'symbol_1' corr.columns.name = 'symbol_2' print(corr)  conn.close() 

for i'm reciving msg:

corr.columns = corr.columns.droplevel() attributeerror: 'index' object has no attribute 'droplevel'  **print(df1d.head())**          date  id_riskfactor         1dreturn 0  2010-09-02            149            0e-12 1  2010-09-02            150  -0.004242875148 2  2010-09-02             33   0.000590000011 3  2010-09-02             28   0.000099999997 4  2010-09-02             34  -0.000010000000  **print(df.head())**          date  id_riskfactor           value         1dreturn         5dreturn 0  2010-09-02            149  0.040096000000            0e-12            0e-12 1  2010-09-02            150  1.736700000000  -0.004242875148  -0.013014321215 2  2010-09-02             33  2.283000000000   0.000590000011   0.001260000048 3  2010-09-02             28  2.113000000000   0.000099999997   0.000469999999 4  2010-09-02             34  0.615000000000  -0.000010000000   0.000079999998  **print(corr.columns)** index([], dtype='object') 

create sample dataframe:

import pandas pd import numpy np  df = pd.dataframe({'daily_return': np.random.random(15),                     'symbol': ['a'] * 5 + ['b'] * 5 + ['c'] * 5,                     'date': np.tile(pd.date_range('1-1-2015', periods=5), 3)})  >>> df     daily_return       date symbol 0       0.011467 2015-01-01      1       0.613518 2015-01-02      2       0.334343 2015-01-03      3       0.371809 2015-01-04      4       0.169016 2015-01-05      5       0.431729 2015-01-01      b 6       0.474905 2015-01-02      b 7       0.372366 2015-01-03      b 8       0.801619 2015-01-04      b 9       0.505487 2015-01-05      b 10      0.946504 2015-01-01      c 11      0.337204 2015-01-02      c 12      0.798704 2015-01-03      c 13      0.311597 2015-01-04      c 14      0.545215 2015-01-05      c 

i'll assume you've filtered dataframe relevant dates. want pivot table have unique dates index , symbols separate columns, daily returns values. finally, call corr() on result.

corr = df.set_index(['date','symbol']).unstack().corr() corr.columns = corr.columns.droplevel() corr.index = corr.columns.tolist()   corr.index.name = 'symbol_1' corr.columns.name = 'symbol_2' >>> corr symbol_2                 b         c symbol_1                                       1.000000  0.188065 -0.745115 b         0.188065  1.000000 -0.688808 c        -0.745115 -0.688808  1.000000 

you can select subset of dataframe based on dates follows:

start_date = pd.timestamp('2015-1-4') end_date = pd.timestamp('2015-1-5') >>> df.loc[df.date.between(start_date, end_date), :]     daily_return       date symbol 3       0.371809 2015-01-04      4       0.169016 2015-01-05      8       0.801619 2015-01-04      b 9       0.505487 2015-01-05      b 13      0.311597 2015-01-04      c 14      0.545215 2015-01-05      c 

if want flatten correlation matrix:

corr.stack().reset_index()   symbol_1 symbol_2         0 0                1.000000 1               b  0.188065 2               c -0.745115 3        b         0.188065 4        b        b  1.000000 5        b        c -0.688808 6        c        -0.745115 7        c        b -0.688808 8        c        c  1.000000 

Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -