Transition matrix calculation

Transition Matrix

What is a transition matrix?

A transition matrix calculates the probability of one state moving to an another state based on historical data.

Given, historical data with different market regimes, we can calculate the probability of one regime moving to another.

How to calculate a transition matrix?

Let us see a simple example

In [1]:
from collections import Counter, defaultdict
import pandas as pd
import numpy as np
In [2]:
lst = [0,1,1,0,0,1]
c = Counter()
for i,j in zip(lst[:-1], lst[1:]):
    c[(i,j)] += 1
print(c)
for k,v in c.items():
    print(f"Probability of state {k} is {v/5}")
Counter({(0, 1): 2, (1, 1): 1, (1, 0): 1, (0, 0): 1})
Probability of state (0, 1) is 0.4
Probability of state (1, 1) is 0.2
Probability of state (1, 0) is 0.2
Probability of state (0, 0) is 0.2

There are six states in the above example, so there would be 5 transitions. The transitions are

  • 0->1
  • 1->1
  • 1->0
  • 0->0
  • 1->0

If you count them, you get the above values and then you can calculate the respective probabilities.

We not only need the probability but also the payoff at each transition. We assume our dataframe has columns for regime and returns. Let us write a general transition matrix program that takes a dataframe having the necessary columns and produces the desired output.

In [3]:
# %load ../../listings/transition_matrix.py
import pandas as pd
import numpy as np
from collections import Counter, defaultdict
from typing import Dict,Union,Callable

def transition_matrix(data:pd.DataFrame, state:str="state", prob:bool=True)->Union[Dict,Counter]:
    """
    Compute the transition matrix
    data
        a pandas dataframe
    state
        column name containing the state
        default state
    prob
        return the result as probabilities
        if false, returns the actual count
    """
    values = data[state].values
    c = Counter()
    for i,j in zip(values[:-1], values[1:]):
        c[(i,j)] +=1
    if prob:
        dct = {}
        total = len(data)-1
        for k,v in c.items():
            dct[k] = v/total
        return dct
    else:
        return c
In [4]:
df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(
columns = lambda x:x.lower()).sort_values(by='date').set_index('date')
df['ret'] = df.close.pct_change()
monthly = df.resample('M').close.ohlc()
In [5]:
monthly['ret'] = monthly.close.pct_change()
monthly['ma'] = monthly.ret.rolling(12).mean()
monthly['state'] = monthly.eval('(ret>ma)+0')
print("Printing transition counts and probabilities")
print(sorted(transition_matrix(monthly, prob=False).items()))
sorted(transition_matrix(monthly).items())
Printing transition counts and probabilities
[((0.0, 0.0), 65), ((0.0, 1.0), 63), ((1.0, 0.0), 63), ((1.0, 1.0), 73)]
Out[5]:
[((0.0, 0.0), 0.24621212121212122),
 ((0.0, 1.0), 0.23863636363636365),
 ((1.0, 0.0), 0.23863636363636365),
 ((1.0, 1.0), 0.2765151515151515)]

Calculating with pandas

We can do the same with pandas by shifting the state column by one.

In [6]:
monthly['state1'] = monthly.state.shift(1)
prob = monthly.groupby(['state', 'state1']).size()
print(prob)
print(prob/(sum(prob)))
state  state1
0.0    0.0       65
       1.0       63
1.0    0.0       63
       1.0       73
dtype: int64
state  state1
0.0    0.0       0.246212
       1.0       0.238636
1.0    0.0       0.238636
       1.0       0.276515
dtype: float64

So, the state transitions are pretty much random as the values are more or less equally distributed over all the states. So, a profitable month doesn't mean that the next month would be positive and same is the case if returns are negative.

Just to make things entertaining, let us add an another state by lagging the state1 variable. We would now have 3 regimes for this month, last month and the month preceding the last month

In [7]:
monthly['state2'] = monthly.state1.shift(1)
prob = monthly.groupby(['state', 'state1', 'state2']).size()
prob/(sum(prob))
Out[7]:
state  state1  state2
0.0    0.0     0.0       0.121673
               1.0       0.121673
       1.0     0.0       0.121673
               1.0       0.117871
1.0    0.0     0.0       0.121673
               1.0       0.117871
       1.0     0.0       0.117871
               1.0       0.159696
dtype: float64

Again, no big edge to investigate further. Thus, 2 consecutive losing months doesn't mean the next month would be positive or vice-versa. This is expected as returns do not have a big auto-correlation. So, a consecutive streak of positive or negative months do not necessarily mean, we can predict the next month returns with greater accuracy.

In [8]:
monthly.ret.autocorr()
Out[8]:
0.058071093402516094

Footnotes

  • A much better and clear explanation for transition matrix is here
  • Adding more regimes may provide a statistical edge