Autocorrelation in volatility

Auto-correlation of volatility

In this workbook, we look at the auto-correlation between returns and volatility and see whether it could be used to add some predictive edge.

Auto-correlation is the correlation of a series with itself over a specified time lag.

A simple definition could be found here

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
In [2]:
df = pd.read_csv('/home/pi/data/sp500.csv', parse_dates=['Date']).rename(
columns = lambda x:x.lower()).sort_values(by='date').set_index('date')
df['ret'] = df.close.pct_change()
df.tail()
Out[2]:
open high low close volume adj close ret
date
2021-11-24 4675.779785 4702.870117 4659.890137 4701.459961 2464040000 4701.459961 0.002294
2021-11-26 4664.629883 4664.629883 4585.430176 4594.620117 2676740000 4594.620117 -0.022725
2021-11-29 4628.750000 4672.950195 4625.259766 4655.270020 3471380000 4655.270020 0.013200
2021-11-30 4640.250000 4646.020020 4560.000000 4567.000000 4950190000 4567.000000 -0.018961
2021-12-01 4602.819824 4652.939941 4510.270020 4513.040039 4078260000 4513.040039 -0.011815

Let us calculate the monthly returns and the volatility.

  • For volatility, I would be calculating a simple standard deviation of the daily returns
  • I would be using the close price instead of the adjusted close price. (Use adjusted close for stocks since they are prone to splits and other corporate actions that has a big impact on price)
In [3]:
monthly_returns = df.resample('M').close.ohlc().close.pct_change()
monthly_volatility = df.resample('M').ret.std()
In [4]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,4))

monthly_returns.plot(title='Monthly Returns (in %)', ax=axes[0])
monthly_volatility.plot(title='Monthly volatility (in %)', ax=axes[1])
Out[4]:
<AxesSubplot:title={'center':'Monthly volatility (in %)'}, xlabel='date'>

So, we could see a big spike in volatility in the expected periods , the 2008 financial crisis and the 2020 covid crisis. We can also observe the big dip in returns.

In [5]:
print(f"Autocorrelation for returns = {monthly_returns.autocorr() :.4f}")
print(f"Autocorrelation for volatility = {monthly_volatility.autocorr() :.4f}")
Autocorrelation for returns = 0.0581
Autocorrelation for volatility = 0.6581

Hurrah! There is a big correlation between this month and next month's volatility.

Let us do the auto-correlation plot for different months.

In [6]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,4))
pd.plotting.autocorrelation_plot(monthly_returns.dropna(), ax=axes[0])
pd.plotting.autocorrelation_plot(monthly_volatility.dropna(),ax=axes[1])
Out[6]:
<AxesSubplot:xlabel='Lag', ylabel='Autocorrelation'>
In [7]:
for i in range(1,7):
    print(f"Auto-correlation for month {i} = {monthly_volatility.autocorr(lag=i) :.4f}")
    
Auto-correlation for month 1 = 0.6581
Auto-correlation for month 2 = 0.4646
Auto-correlation for month 3 = 0.3669
Auto-correlation for month 4 = 0.2495
Auto-correlation for month 5 = 0.2351
Auto-correlation for month 6 = 0.2306

From the above plots, it is clear that returns are random and previous 1 month or 2 months returns doesn't have a big impact in the forthcoming months. So, if you see some positive returns for some months, the next month may not necessarily have positive return and vice-versa; same with the second or third month. The unconditional probability of returns being positive or negative is 50% - equivalent to coin toss.

But the volatility has a good auto-correlation that extends to even 6 months, which suggests that volatility even though it shoots up suddenly has a lasting effect. The volatility of this month is going to be likely that of the previous month and there is going to be no sudden decline or increase so as to make it entirely random.

In [8]:
# Let us plot the rolling monthly volatility

monthly_volatility.rolling(12).mean().plot()

# So, the rolling volatility in 2021 is less than that of 2008
Out[8]:
<AxesSubplot:xlabel='date'>

Is there a predictive edge?

Let us try a simple filter based on volatility.

In [9]:
vol_df = pd.DataFrame({
    'returns': monthly_returns,
    'volatility': monthly_volatility
})
# I am shifting this by one row so that the previous monthly volatility is looked into
vol_df['rolling_vol'] = vol_df.volatility.rolling(12).median().shift(1)
vol_df['is_volatile'] = vol_df.eval('(volatility > rolling_vol)+0')
vol_df.groupby('is_volatile').returns.describe()
Out[9]:
count mean std min 25% 50% 75% max
is_volatile
0.0 152.0 0.016845 0.034129 -0.092291 0.000111 0.018150 0.035785 0.107546
1.0 112.0 -0.010584 0.049312 -0.169425 -0.034967 -0.005509 0.014216 0.126844

Looks like there is an edge to exploit that need to be investigated further.

We may try to develop a strategy based on this

Footnotes

  • For a detailed description of autocorrelation, you could look at the wikipedia page

  • This kaggle notebook has a rich set of python code on this topic

  • Though returns don't have autocorrelation and so the probability of positive or negative returns is 50%, there is a slight edge if the probability is conditioned on some variable.