{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0bb053b6",
   "metadata": {},
   "source": [
    "# Transition Matrix\n",
    "\n",
    "\n",
    "## What is a transition matrix?\n",
    "\n",
    "A transition matrix calculates the probability of one state moving to an another state based on historical data.\n",
    "\n",
    "Given, historical data with different market regimes, we can calculate the probability of one regime moving to another.\n",
    "\n",
    "## How to calculate a transition matrix?\n",
    "\n",
    "Let us see a simple example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5e711c4b-ed89-4806-bb0f-c84a7d35aed2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter, defaultdict\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1002c8f8-b467-4947-a126-d37e7e8e7e28",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Counter({(0, 1): 2, (1, 1): 1, (1, 0): 1, (0, 0): 1})\n",
      "Probability of state (0, 1) is 0.4\n",
      "Probability of state (1, 1) is 0.2\n",
      "Probability of state (1, 0) is 0.2\n",
      "Probability of state (0, 0) is 0.2\n"
     ]
    }
   ],
   "source": [
    "lst = [0,1,1,0,0,1]\n",
    "c = Counter()\n",
    "for i,j in zip(lst[:-1], lst[1:]):\n",
    "    c[(i,j)] += 1\n",
    "print(c)\n",
    "for k,v in c.items():\n",
    "    print(f\"Probability of state {k} is {v/5}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "881a88fb-dde7-436e-b34e-c4e4560a9b45",
   "metadata": {},
   "source": [
    "There are six states in the above example, so there would be 5 transitions. The transitions are\n",
    "\n",
    "* 0->1\n",
    "* 1->1\n",
    "* 1->0\n",
    "* 0->0\n",
    "* 1->0\n",
    "\n",
    "If you count them, you get the above values and then you can calculate the respective probabilities.\n",
    "\n",
    "We not only need the probability but also the payoff at each transition. We assume our dataframe has columns for regime and returns. Let us write a general transition matrix program that takes a dataframe having the necessary columns and produces the desired output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8c3383d8-d557-4b74-af0f-17d3e8fe2f8d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load ../../listings/transition_matrix.py\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from collections import Counter, defaultdict\n",
    "from typing import Dict,Union,Callable\n",
    "\n",
    "def transition_matrix(data:pd.DataFrame, state:str=\"state\", prob:bool=True)->Union[Dict,Counter]:\n",
    "    \"\"\"\n",
    "    Compute the transition matrix\n",
    "    data\n",
    "        a pandas dataframe\n",
    "    state\n",
    "        column name containing the state\n",
    "        default state\n",
    "    prob\n",
    "        return the result as probabilities\n",
    "        if false, returns the actual count\n",
    "    \"\"\"\n",
    "    values = data[state].values\n",
    "    c = Counter()\n",
    "    for i,j in zip(values[:-1], values[1:]):\n",
    "        c[(i,j)] +=1\n",
    "    if prob:\n",
    "        dct = {}\n",
    "        total = len(data)-1\n",
    "        for k,v in c.items():\n",
    "            dct[k] = v/total\n",
    "        return dct\n",
    "    else:\n",
    "        return c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ff4d2846-03e6-43de-8417-9badc0027b8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(\n",
    "columns = lambda x:x.lower()).sort_values(by='date').set_index('date')\n",
    "df['ret'] = df.close.pct_change()\n",
    "monthly = df.resample('M').close.ohlc()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "616139e4-97c5-41da-9740-577301245152",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Printing transition counts and probabilities\n",
      "[((0.0, 0.0), 65), ((0.0, 1.0), 63), ((1.0, 0.0), 63), ((1.0, 1.0), 73)]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[((0.0, 0.0), 0.24621212121212122),\n",
       " ((0.0, 1.0), 0.23863636363636365),\n",
       " ((1.0, 0.0), 0.23863636363636365),\n",
       " ((1.0, 1.0), 0.2765151515151515)]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly['ret'] = monthly.close.pct_change()\n",
    "monthly['ma'] = monthly.ret.rolling(12).mean()\n",
    "monthly['state'] = monthly.eval('(ret>ma)+0')\n",
    "print(\"Printing transition counts and probabilities\")\n",
    "print(sorted(transition_matrix(monthly, prob=False).items()))\n",
    "sorted(transition_matrix(monthly).items())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9363de8-deb2-49f9-ae3d-6ce4aec9f5ce",
   "metadata": {},
   "source": [
    "## Calculating with pandas\n",
    "\n",
    "We can do the same with pandas by shifting the state column by one. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bcfa587b-bf6c-4f49-b4a9-15b6ad1a0dfb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "state  state1\n",
      "0.0    0.0       65\n",
      "       1.0       63\n",
      "1.0    0.0       63\n",
      "       1.0       73\n",
      "dtype: int64\n",
      "state  state1\n",
      "0.0    0.0       0.246212\n",
      "       1.0       0.238636\n",
      "1.0    0.0       0.238636\n",
      "       1.0       0.276515\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "monthly['state1'] = monthly.state.shift(1)\n",
    "prob = monthly.groupby(['state', 'state1']).size()\n",
    "print(prob)\n",
    "print(prob/(sum(prob)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3270864d-ddfe-4b32-b240-a035b5b1f510",
   "metadata": {},
   "source": [
    "So, the state transitions are pretty much random as the values are more or less equally distributed over all the states. So, a profitable month doesn't mean that the next month would be positive and same is the case if returns are negative.\n",
    "\n",
    "Just to make things entertaining, let us add an another state by lagging the state1 variable. We would now have 3 regimes for this month, last month and the month preceding the last month\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4b7b79ad-a041-49a4-93c8-a44be9d12468",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state  state1  state2\n",
       "0.0    0.0     0.0       0.121673\n",
       "               1.0       0.121673\n",
       "       1.0     0.0       0.121673\n",
       "               1.0       0.117871\n",
       "1.0    0.0     0.0       0.121673\n",
       "               1.0       0.117871\n",
       "       1.0     0.0       0.117871\n",
       "               1.0       0.159696\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly['state2'] = monthly.state1.shift(1)\n",
    "prob = monthly.groupby(['state', 'state1', 'state2']).size()\n",
    "prob/(sum(prob))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2461185a-b39e-4f27-96ce-b9a2e04feb71",
   "metadata": {},
   "source": [
    "Again, no big edge to investigate further. Thus, 2 consecutive losing months doesn't mean the next month would be positive or vice-versa. This is expected as returns do not have a big auto-correlation. So, a consecutive streak of positive or negative months do not necessarily mean, we can predict the next month returns with greater accuracy.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c60b0dad-e732-470f-8bce-2dfd5bca1d9e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.058071093402516094"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly.ret.autocorr()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6b85d15-d9c3-48d5-a7fb-c476a8222e5d",
   "metadata": {},
   "source": [
    "## Footnotes\n",
    "\n",
    "*  A much better and clear explanation for transition matrix is [here](https://www.probabilitycourse.com/chapter11/11_2_1_introduction.php)\n",
    "*  Adding more regimes may provide a statistical edge\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "nikola": {
   "category": "",
   "date": "2021-12-17 04:03:08 UTC",
   "description": "",
   "link": "",
   "slug": "transition-matrix-calculation",
   "tags": "probability, transition matrix",
   "title": "Transition matrix calculation",
   "type": "text"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}