Transition matrix calculation
Transition Matrix¶
What is a transition matrix?¶
A transition matrix calculates the probability of one state moving to an another state based on historical data.
Given, historical data with different market regimes, we can calculate the probability of one regime moving to another.
How to calculate a transition matrix?¶
Let us see a simple example
from collections import Counter, defaultdict
import pandas as pd
import numpy as np
lst = [0,1,1,0,0,1]
c = Counter()
for i,j in zip(lst[:-1], lst[1:]):
c[(i,j)] += 1
print(c)
for k,v in c.items():
print(f"Probability of state {k} is {v/5}")
There are six states in the above example, so there would be 5 transitions. The transitions are
- 0->1
- 1->1
- 1->0
- 0->0
- 1->0
If you count them, you get the above values and then you can calculate the respective probabilities.
We not only need the probability but also the payoff at each transition. We assume our dataframe has columns for regime and returns. Let us write a general transition matrix program that takes a dataframe having the necessary columns and produces the desired output.
# %load ../../listings/transition_matrix.py
import pandas as pd
import numpy as np
from collections import Counter, defaultdict
from typing import Dict,Union,Callable
def transition_matrix(data:pd.DataFrame, state:str="state", prob:bool=True)->Union[Dict,Counter]:
"""
Compute the transition matrix
data
a pandas dataframe
state
column name containing the state
default state
prob
return the result as probabilities
if false, returns the actual count
"""
values = data[state].values
c = Counter()
for i,j in zip(values[:-1], values[1:]):
c[(i,j)] +=1
if prob:
dct = {}
total = len(data)-1
for k,v in c.items():
dct[k] = v/total
return dct
else:
return c
df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(
columns = lambda x:x.lower()).sort_values(by='date').set_index('date')
df['ret'] = df.close.pct_change()
monthly = df.resample('M').close.ohlc()
monthly['ret'] = monthly.close.pct_change()
monthly['ma'] = monthly.ret.rolling(12).mean()
monthly['state'] = monthly.eval('(ret>ma)+0')
print("Printing transition counts and probabilities")
print(sorted(transition_matrix(monthly, prob=False).items()))
sorted(transition_matrix(monthly).items())
Calculating with pandas¶
We can do the same with pandas by shifting the state column by one.
monthly['state1'] = monthly.state.shift(1)
prob = monthly.groupby(['state', 'state1']).size()
print(prob)
print(prob/(sum(prob)))
So, the state transitions are pretty much random as the values are more or less equally distributed over all the states. So, a profitable month doesn't mean that the next month would be positive and same is the case if returns are negative.
Just to make things entertaining, let us add an another state by lagging the state1 variable. We would now have 3 regimes for this month, last month and the month preceding the last month
monthly['state2'] = monthly.state1.shift(1)
prob = monthly.groupby(['state', 'state1', 'state2']).size()
prob/(sum(prob))
Again, no big edge to investigate further. Thus, 2 consecutive losing months doesn't mean the next month would be positive or vice-versa. This is expected as returns do not have a big auto-correlation. So, a consecutive streak of positive or negative months do not necessarily mean, we can predict the next month returns with greater accuracy.
monthly.ret.autocorr()