Stock returns and Risk metrics

Stock returns and risk estimation

One of the common methods to estimate risk is drawdown. Drawdown1 is the difference between the high and a subsequent low before it goes above the high. Underwater period is the period between the high and the time taken to reach the next high.

Though it is one of the common methods to measure risk, it takes the exact historical path taken by the stock returns. Even optimistically assuming that the stock returns would reflect historical returns2, it may not take the same path as it took previously.

Let us look at some data and do some simulations.

In [1]:
# Necessary imports

import pandas_datareader.data as web
import pandas as pd
import numpy as np
import random
import yfinance as yf
import quantstats as qs
import empyrical as ep # For faster processing of portfolio stats and metrics
yf.pdr_override() # Fix to make pandas_datareader work properly in latest version
import seaborn as sns
sns.set()
In [2]:
# parameters - change the parameters here and run the notebook again

symbol:str = 'AAPL'
start_date:str = '2016-01-04'
num_simulations:int = 1000 # number of simulations to run
In [3]:
# Download data and print some stats

df = web.get_data_yahoo([symbol], start=start_date)
df['ret'] = df['Adj Close'].pct_change()
df['Adj Close'].plot()
[*********************100%***********************]  1 of 1 completed
Out[3]:
<AxesSubplot:xlabel='Date'>
In [4]:
qs.reports.basic(df['Adj Close'])
Performance Metrics
                    Strategy
------------------  ----------
Start Period        2016-01-04
End Period          2023-01-26
Risk-Free Rate      0.0%
Time in Market      100.0%

Cumulative Return   491.09%
CAGR﹪              28.59%

Sharpe              0.98
Prob. Sharpe Ratio  99.53%
Sortino             1.44
Sortino/√2          1.02
Omega               1.2

Max Drawdown        -38.52%
Longest DD Days     387

Gain/Pain Ratio     0.2
Gain/Pain (1M)      1.02

Payoff Ratio        1.04
Profit Factor       1.2
Common Sense Ratio  1.18
CPC Index           0.67
Tail Ratio          0.98
Outlier Win Ratio   3.85
Outlier Loss Ratio  4.01

MTD                 9.69%
3M                  -6.29%
6M                  -6.54%
YTD                 9.69%
1Y                  -10.28%
3Y (ann.)           27.14%
5Y (ann.)           28.27%
10Y (ann.)          28.59%
All-time (ann.)     28.59%

Avg. Drawdown       -4.17%
Avg. Drawdown Days  26
Recovery Factor     12.75
Ulcer Index         0.12
Serenity Index      3.77
Strategy Visualization

Shuffled drawdown

When we look at stock returns, we look at the specific path taken by the stock to reach the current level. But the stock would have taken any path to reach the current level. We can think of this as a route to reach some destination. There could be multiple routes to reach the destination and each route can be different.

We would create a shuffled_prices function that shuffles the daily returns and gives a new price series. Note that the starting and ending value would be the same since the source and destination are the same, only the routes differ.

In [5]:
def shuffled_prices(start_price, returns, index):
    np.random.shuffle(returns)
    s = start_price*(1+returns).cumprod()
    prices = np.hstack([start_price, s]) # To always match the first value
    return pd.Series(prices, index=index)
In [6]:
daily_returns = df.dropna()['ret'].values
start_value = df.iloc[0]['Adj Close']
index = df.index
df['Adj Close'].plot(figsize=(10,6))
for i in range(5):
    s = shuffled_prices(start_value, daily_returns, index)
    s.plot()

Let us simulate 1000 different paths and look at the drawdown distribution.

In [7]:
%%time
sharpe = []
returns = []
drawdowns = []

for i in range(num_simulations):
    s = shuffled_prices(start_value, daily_returns, index)
    rets = s.pct_change()
    sharpe.append(ep.sharpe_ratio(rets))
    returns.append(ep.cum_returns_final(rets))
    drawdowns.append(ep.max_drawdown(rets))
CPU times: user 4.33 s, sys: 4.77 ms, total: 4.34 s
Wall time: 4.34 s

The returns and sharpe ratio must be the same since we are only shuffling the returns

Let us check and confirm its fine

In [8]:
pd.DataFrame({
    'sharpe': sharpe,
    'returns': returns
}).plot(subplots=True)
Out[8]:
array([<AxesSubplot:>, <AxesSubplot:>], dtype=object)

Now we could plot the drawdowns distribution

In [9]:
dds = pd.Series(drawdowns)
print(f"Maximum expected drawdown is {dds.min()*100 :.2f}% and Minimum expected drawdown={dds.max()*100 :.2f}%")
sns.histplot(dds).set_title('Drawdowns distribution')
Maximum expected drawdown is -66.83% and Minimum expected drawdown=-23.59%
Out[9]:
Text(0.5, 1.0, 'Drawdowns distribution')
In [10]:
dds.describe()
Out[10]:
count    1000.000000
mean       -0.401853
std         0.075064
min        -0.668313
25%        -0.451122
50%        -0.392851
75%        -0.346064
max        -0.235889
dtype: float64

Despite the same returns and the same sharpe ratio, we could see a wide variation in the drawdown percentages. I have purposefully not set random.seed so that it throws different results each time. I ran this experiment with a seed of 1000

You could see drawdowns in excess of 50% a lot of time. Half of the times, the drawdown exceeded the historical drawdown. This is expected since the historical returns is just one route in which we arrived at the present returns. The worst possible case is around 60% which is way off the 38% but it still produced the same returns with the same volatility.

This could be tried with different stocks and different periods and also different number of simulations. The more the volatility and the more the number of simulations, you could expect the drawdown distribution to be more volatile.

Always look at drawdown distributions when estimating risk with drawdown

Sampled drawdown

The second way to estimate drawdowns is to assume the distribution of returns would persist in the future and then try to simulate how the returns would behave if randomly sampled from this distribution.

To do this, we could randomly sample returns with replacement and then try to estimate the stock returns. This could give us an estimate of how given the same distribution of returns, how the stock price would have changed. We create a function sampled_prices that draws randomly with replacement from the given stock returns.

We would be repeating the same process for what we did with shuffled drawdown

Note since we are sampling with replacement, all metrics would change.

In [11]:
def sampled_prices(start_price, returns, index):
    rets = np.random.choice(returns, size=len(returns))
    s = start_price*(1+rets).cumprod()
    prices = np.hstack([start_price, s]) # To always match the first value
    return pd.Series(prices, index=index)
In [12]:
daily_returns = df.dropna()['ret'].values
start_value = df.iloc[0]['Adj Close']
index = df.index
df['Adj Close'].plot(figsize=(10,6))
for i in range(5):
    s = sampled_prices(start_value, daily_returns, index)
    s.plot()
In [13]:
%%time
sharpe = []
returns = []
drawdowns = []

for i in range(num_simulations):
    s = sampled_prices(start_value, daily_returns, index)
    rets = s.pct_change()
    sharpe.append(ep.sharpe_ratio(rets))
    returns.append(ep.cum_returns_final(rets))
    drawdowns.append(ep.max_drawdown(rets))
CPU times: user 3.77 s, sys: 11.3 ms, total: 3.78 s
Wall time: 3.81 s
In [14]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(10,6))
sns.histplot(returns, ax=axes[0]).set(title='Returns Distribution')
sns.histplot(drawdowns, ax=axes[1]).set(title='Drawdown Distribution')
Out[14]:
[Text(0.5, 1.0, 'Drawdown Distribution')]
In [15]:
sns.histplot(sharpe).set(title='Sharpe Ratio distribution')
print(f"Maximum expected drawdown is {min(drawdowns)*100 :.2f}% and Minimum expected drawdown= {max(drawdowns)*100 :.2f}%")
print(f"Maximum expected returns is {max(returns)*100 :.2f}% and Minimum expected returns = {min(returns)*100 :.2f}%")
Maximum expected drawdown is -81.87% and Minimum expected drawdown= -19.37%
Maximum expected returns is 7118.02% and Minimum expected returns = -70.14%
In [16]:
pd.DataFrame({
    'returns': returns,
    'drawdown': drawdowns,
    'sharpe': sharpe
}).describe()
Out[16]:
returns drawdown sharpe
count 1000.000000 1000.000000 1000.000000
mean 7.065627 -0.412985 0.972503
std 7.426793 0.104811 0.388797
min -0.701385 -0.818685 -0.393827
25% 2.245840 -0.479273 0.707935
50% 4.879541 -0.403131 0.979291
75% 9.276520 -0.334050 1.246753
max 71.180200 -0.193679 2.147071

We can see the returns all over the place though the drawdown behaves more or less the same compared to the previous simulation based on shuffling returns. Even with a stock like Apple you could see how the returns could be very minimal even though everything is based on the same underlying distribution of returns. If you are lucky enough, you could end up on the other side where you could have easily got more than 1000% returns and looked like a rockstar.

We could appreciate this quote from Keynes

Markets can stay irrational longer than you can stay solvent

Bottom line

Using drawdown as a sole criteria for estimating risk is not so advisable
Volatility does play an important role in estimating future returns

Footnotes

  1. A better and detailed explanation on [drawdown](https://www.investopedia.com/terms/d/drawdown.asp) could be found at investopedia
  2. Future returns may or may not reflect historical returns. Most of the times for individual stocks, it may not and might depend upon a lot of factors

Transition matrix calculation

Transition Matrix

What is a transition matrix?

A transition matrix calculates the probability of one state moving to an another state based on historical data.

Given, historical data with different market regimes, we can calculate the probability of one regime moving to another.

How to calculate a transition matrix?

Let us see a simple example

In [1]:
from collections import Counter, defaultdict
import pandas as pd
import numpy as np
In [2]:
lst = [0,1,1,0,0,1]
c = Counter()
for i,j in zip(lst[:-1], lst[1:]):
    c[(i,j)] += 1
print(c)
for k,v in c.items():
    print(f"Probability of state {k} is {v/5}")
Counter({(0, 1): 2, (1, 1): 1, (1, 0): 1, (0, 0): 1})
Probability of state (0, 1) is 0.4
Probability of state (1, 1) is 0.2
Probability of state (1, 0) is 0.2
Probability of state (0, 0) is 0.2

There are six states in the above example, so there would be 5 transitions. The transitions are

  • 0->1
  • 1->1
  • 1->0
  • 0->0
  • 1->0

If you count them, you get the above values and then you can calculate the respective probabilities.

We not only need the probability but also the payoff at each transition. We assume our dataframe has columns for regime and returns. Let us write a general transition matrix program that takes a dataframe having the necessary columns and produces the desired output.

In [3]:
# %load ../../listings/transition_matrix.py
import pandas as pd
import numpy as np
from collections import Counter, defaultdict
from typing import Dict,Union,Callable

def transition_matrix(data:pd.DataFrame, state:str="state", prob:bool=True)->Union[Dict,Counter]:
    """
    Compute the transition matrix
    data
        a pandas dataframe
    state
        column name containing the state
        default state
    prob
        return the result as probabilities
        if false, returns the actual count
    """
    values = data[state].values
    c = Counter()
    for i,j in zip(values[:-1], values[1:]):
        c[(i,j)] +=1
    if prob:
        dct = {}
        total = len(data)-1
        for k,v in c.items():
            dct[k] = v/total
        return dct
    else:
        return c
In [4]:
df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(
columns = lambda x:x.lower()).sort_values(by='date').set_index('date')
df['ret'] = df.close.pct_change()
monthly = df.resample('M').close.ohlc()
In [5]:
monthly['ret'] = monthly.close.pct_change()
monthly['ma'] = monthly.ret.rolling(12).mean()
monthly['state'] = monthly.eval('(ret>ma)+0')
print("Printing transition counts and probabilities")
print(sorted(transition_matrix(monthly, prob=False).items()))
sorted(transition_matrix(monthly).items())
Printing transition counts and probabilities
[((0.0, 0.0), 65), ((0.0, 1.0), 63), ((1.0, 0.0), 63), ((1.0, 1.0), 73)]
Out[5]:
[((0.0, 0.0), 0.24621212121212122),
 ((0.0, 1.0), 0.23863636363636365),
 ((1.0, 0.0), 0.23863636363636365),
 ((1.0, 1.0), 0.2765151515151515)]

Calculating with pandas

We can do the same with pandas by shifting the state column by one.

In [6]:
monthly['state1'] = monthly.state.shift(1)
prob = monthly.groupby(['state', 'state1']).size()
print(prob)
print(prob/(sum(prob)))
state  state1
0.0    0.0       65
       1.0       63
1.0    0.0       63
       1.0       73
dtype: int64
state  state1
0.0    0.0       0.246212
       1.0       0.238636
1.0    0.0       0.238636
       1.0       0.276515
dtype: float64

So, the state transitions are pretty much random as the values are more or less equally distributed over all the states. So, a profitable month doesn't mean that the next month would be positive and same is the case if returns are negative.

Just to make things entertaining, let us add an another state by lagging the state1 variable. We would now have 3 regimes for this month, last month and the month preceding the last month

In [7]:
monthly['state2'] = monthly.state1.shift(1)
prob = monthly.groupby(['state', 'state1', 'state2']).size()
prob/(sum(prob))
Out[7]:
state  state1  state2
0.0    0.0     0.0       0.121673
               1.0       0.121673
       1.0     0.0       0.121673
               1.0       0.117871
1.0    0.0     0.0       0.121673
               1.0       0.117871
       1.0     0.0       0.117871
               1.0       0.159696
dtype: float64

Again, no big edge to investigate further. Thus, 2 consecutive losing months doesn't mean the next month would be positive or vice-versa. This is expected as returns do not have a big auto-correlation. So, a consecutive streak of positive or negative months do not necessarily mean, we can predict the next month returns with greater accuracy.

In [8]:
monthly.ret.autocorr()
Out[8]:
0.058071093402516094

Footnotes

  • A much better and clear explanation for transition matrix is here
  • Adding more regimes may provide a statistical edge

A simple strategy based on autocorrelation in volatility

Introduction

We saw some monthly auto-correlation in volatility here. Let us try to create a strategy out of it and see if it holds an edge. Our strategy is to hold the stock when the volatility is low.

Strategy

  1. Calculate the monthly volatility of the stock
  2. Calculate the 12-period rolling monthly volatility
  3. If the present month volatility is less than the rolling volatilty, hold the stock

Volatility is simply the standard deviation of the daily returns

Read more... 4 minute read

Price filter major indices

Price filter for major indices

We already looked at a price filter for the S&P500 index here.

Let us try it for some other common indices and see how it works on them.

List of indices to test

  • Dow Jones - United States
  • Russell 2000 - England
  • Nifty 50 - India
  • S&P500 - United States
  • Dax - Germany
  • Nikkei 225 - Japan
  • Hangseng - HongKong
  • Moex - Russia
  • SSE - China
  • Asx 200 - Australia
  • Euronext100 - Europe
  • Tsx Composite - Canada

I have downloaded the data from Yahoo from the start of 2000 and applied the moving average price filter for each of the indices. The get data function does all the data stuff. Finally, I group by each stock exchange and the price filter and take the daily mean of the returns.

In [15]:
import pandas as pd
import pandas_datareader as web
import seaborn as sns
sns.set()
sns.set(rc={'figure.figsize':(12,6)})
In [16]:
# parameters
ma = 60

# tickers are specified as name versus symbol in yahoo finance
tickers = {
    'dow': '^DJI',
    'russell': '^RUT',
    'nifty': '^NSEI',
    'snp500': '^GSPC',
    'dax': '^GDAXI',
    'nikkei': '^N225',
    'hangseng': '^HSI',
    'moex': 'IMOEX.me',
    'sse': '000001.SS',
    'asx': '^AXAT',
    'euronext': '^N100',
    'tsx comp': '^GSPTSE'
}
In [17]:
def get_data(index, name):
    """
    Get the data for the 
    """
    df = web.DataReader(index, 'yahoo', start='2000-01-01').rename(columns = lambda x: x.lower())
    df = df.sort_index()
    df['year'] = df.index.year
    df['ret'] = df.close.pct_change()
    df['ma_price'] = df.close.rolling(ma).median().shift(1)
    df['is_price'] = df.eval('close > ma_price')+0
    df['is_price'] = df.is_price.shift(1) # Shifting price since we use the signal only next day
    df['name'] = name
    return df
In [18]:
collect = []
for k,v in tickers.items():
    try:
        tmp = get_data(v, name=k)
        collect.append(tmp)
    except Exception as e:
        print(e)
df = pd.concat(collect)
    
In [19]:
mean_returns = df.groupby(['name', 'is_price']).ret.mean().reset_index()
sns.barplot(data=mean_returns, x='name', y='ret', hue='is_price')
Out[19]:
<AxesSubplot:xlabel='name', ylabel='ret'>

Looks like there is no common pattern among the indices.

And all except hangseng and see provide positive returns for both the filters.

On a second look, snp500, dow, russell,euronext does well with a filter of 0 while nifty,nikkei,sse,tsx does well with a filter of 1.

May be

  • the american and european markets exhibit different behaviour than the asian counterparts (but TSX is from Canada)
  • the older exchanges may exhibit a different behaviour than the new one
  • the timezone effect
  • or just plain random

I cannot come up with any conclusive explanation. This is a good topic to explore further with a whole lot of other indices

There is no substantial edge to practically trade this filter. Also, we have not run any statistical test to validate significance
In [20]:
# Starting date from which data is computed
df.reset_index().groupby('name').Date.min().sort_values()
Out[20]:
name
dow        1999-12-31
euronext   1999-12-31
russell    1999-12-31
snp500     1999-12-31
tsx comp   1999-12-31
dax        2000-01-03
hangseng   2000-01-03
nikkei     2000-01-04
sse        2000-01-04
asx        2007-06-15
nifty      2007-09-17
moex       2013-03-05
Name: Date, dtype: datetime64[ns]

Autocorrelation in volatility

Auto-correlation of volatility

In this workbook, we look at the auto-correlation between returns and volatility and see whether it could be used to add some predictive edge.

Auto-correlation is the correlation of a series with itself over a specified time lag.

A simple definition could be found here

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
In [2]:
df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(
columns = lambda x:x.lower()).sort_values(by='date').set_index('date')
df['ret'] = df.close.pct_change()
df.tail()
Out[2]:
open high low close volume adj close ret
date
2021-11-24 4675.779785 4702.870117 4659.890137 4701.459961 2464040000 4701.459961 0.002294
2021-11-26 4664.629883 4664.629883 4585.430176 4594.620117 2676740000 4594.620117 -0.022725
2021-11-29 4628.750000 4672.950195 4625.259766 4655.270020 3471380000 4655.270020 0.013200
2021-11-30 4640.250000 4646.020020 4560.000000 4567.000000 4950190000 4567.000000 -0.018961
2021-12-01 4602.819824 4652.939941 4510.270020 4513.040039 4078260000 4513.040039 -0.011815

Let us calculate the monthly returns and the volatility.

  • For volatility, I would be calculating a simple standard deviation of the daily returns
  • I would be using the close price instead of the adjusted close price. (Use adjusted close for stocks since they are prone to splits and other corporate actions that has a big impact on price)

    Read more... 5 minute read

A simple regime filter

Regime filters

A regime filter is a classification to indicate the market condition we are in. They could provide some useful pointers to improve trading results. Though we cannot predict the market movements with certainty, we can atleast get an edge.

We should keep these filters simple so that it is easy to understand and do not overfit too much to available data. Boolean filters are a good choice since they only have a true or false value and not prone to too much overfitting, though they may only have a slight edge.

A simple price filter

Let us a simple price filter.

  1. Create a moving average of the price for the last 60 days.
  2. If the close price is greater than the average price, give it a value of 1.
  3. If the close price is less than the average price, give it a value of 0

We shift the average price by 1 day so that we do not include today's close price

Read more... 4 minute read