Moving average indicators are commonly used to give traders a general idea about the direction of the trend by smoothing the price series. One of the big drawbacks to most common moving averages is the lag with which they operate. A strong trend up or down may take a long time to get confirmation from the series leading to lost profit.

In 2005, Alan Hull devised the Hull Moving Average (HMA) to address this problem.

The calculation is relatively straightforward and can be done in 4 steps after choosing the number of periods, N, to use in the calculation:

1. Calculate the simple moving average over the past N periods.
• SMA1 = SMA(price, N)
1. Calculate the simple moving average over the past N/2 periods, rounded o the nearest whole value.
• SMA2 = SMA(price, int(N/2))
1. Multiply the shorter moving average by 2 and then subtract the first moving average from this.
• SMA_diff = 2 * SMA2 - SMA1
1. Take the moving average of this value over a period length equal to the square root of N, rounded to the nearest whole number.
• HMA = SMA(SMA_diff, int(sqrt(N)))

This winds up being more responsive to recent changes in price because we’re taking the most recent half of our data and multiplying it by 2. This provides an additional weighting on those values before we smooth things out again with the final moving average calculation. Confusingly, many blogs list each of these moving averages as weighted moving averages, but never specify the weights themselves. Don’t worry about that, all we have are a few simple moving averages which are weighted before being combined at the end.

For completeness, we can also write this out mathematically.

If we are calculating the SMA at time t over the last N periods, we’re going to call this SMA^N_t​. For moving averages, we’re just getting a summation over the last N prices (we’ll use P for prices) and dividing by N like so:

$$SMA_t^N = \frac{1}{N}\sum_{i=1}^N P_{i-N}$$ $$SMA_t^M = \frac{1}{M}\sum_{i=1}^M P_{i-M}$$ $$HMA_t^M = \frac{1}{H}\sum_{i=1}^H (2SMA_t^M - SMA_t^N)$$

where the symbols M and H are N/2 and the square root of N rounded to the nearest integer values.

$$M = \bigg\lfloor \frac{N}{2} \bigg\rceil$$ $$H = \bigg\lfloor \sqrt{N} \bigg\rceil$$

## Hull Moving Average in Python

Like usual, let’s grab a few packages.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf

From here, we can write a function to calculate the HMA in just three lines of code, corresponding to the three equations we showed above.

def calcHullMA(price: pd.Series, N=50):
SMA1 = price.rolling(N).mean()
SMA2 = price.rolling(int(N/2)).mean()
return (2 * SMA2 - SMA1).rolling(int(np.sqrt(N))).mean()

We have our two moving averages, take the difference, and then smooth out the results with a third moving average. This function assumes we’re working with a Pandas data series and takes advantage of many of the methods that enables. Just be careful not to pass it a list or a NumPy array!

### Getting Some Data

Let’s illustrate how this works on some historical data. I’m just getting a year’s worth from a common stock, DAL.

ticker = 'DAL'
start = '2014-01-01'
end = '2015-01-01'
yfObj = yf.Ticker(ticker)
data = yfObj.history(start=start, end=end)
data.drop(['Open', 'High', 'Low', 'Volume', 'Dividends', 'Stock Splits'],
axis=1, inplace=True)

# Applying our function
N = 50
data[f'HMA_{N}'] = calcHullMA(data['Close'], N)

Take a look to see how it behaves:

plt.figure(figsize=(12, 8))
plt.plot(data['Close'], label='Close')
plt.plot(data[f'HMA_{N}'], label='HMA')
plt.xlabel('Date')
plt.ylabel('Price ($)') plt.title(f'HMA and Price for {ticker}') plt.legend() plt.show() As you can see, the HMA follows pretty closely. Of course, there is a lag as can be seen with some of the larger peaks and valleys over this time frame. Does it smooth well and with a lower lag than other moving averages as Hull intends? To find out, let’s compare it to a typical, simple moving average and an exponential moving average (EMA). Like the HMA, the EMA is designed to be more responsive to recent price changes. The code for the EMA calculation below was taken from a previous post you can dive into for further details. def _calcEMA(P, last_ema, N): return (P - last_ema) * (2 / (N + 1)) + last_ema def calcEMA(data: pd.DataFrame, N: int, key: str = 'Close'): # Initialize series sma = data[key].rolling(N).mean() ema = np.zeros(len(data)) + np.nan for i, _row in enumerate(data.iterrows()): row = _row if np.isnan(ema[i-1]): ema[i] = sma[i] else: ema[i] = _calcEMA(row[key], ema[i-1], N) return ema Plotting the results: data[f'EMA_{N}'] = calcEMA(data, N) data[f'SMA_{N}'] = data['Close'].rolling(N).mean() plt.figure(figsize=(12, 8)) plt.plot(data['Close'], label='Close', linewidth=0.5) plt.plot(data[f'HMA_{N}'], label='HMA') plt.plot(data[f'EMA_{N}'], label='EMA') plt.plot(data[f'SMA_{N}'], label='SMA') plt.xlabel('Date') plt.ylabel('Price ($)')
plt.title('Comparing 50-Day Moving Averages to Price')
plt.legend()
plt.show() The plot looks pretty good. The HMA seems to track the price more closely than the other indicators while providing some good smoothing. However, we aren’t technical traders here at Raposa, so we need to do more than just look at a chart. We want to see the data!

To get an idea for the tracking error, we’re going to use the root mean square error (RMSE) to measure the difference between the indicator value and the price.

The RMSE is a common error metric that punishes deviations by squaring the error term. This means an error of 2 is 4 times greater than an error of 1! These squared errors all get summed up and then we take the square root of the values divided by the number of observations, n.

$$RMSE = \sqrt{\frac{\sum_t \big(\hat{P}_t - P_t \big)^2}{n}}$$

We'll run our errors through a quick RMSE function we'll write and see the results.

# Calculate tracking error
def calcRMSE(price, indicator):
sq_error = np.power(indicator - price, 2).sum()
n = len(indicator.dropna())
return np.sqrt(sq_error / n)

hma_error = calcRMSE(data['Close'], data[f'HMA_{N}'])
ema_error = calcRMSE(data['Close'], data[f'EMA_{N}'])
sma_error = calcRMSE(data['Close'], data[f'SMA_{N}'])

print('Lag Error')
print(f'\tHMA = \t{hma_error:.2f}')
print(f'\tEMA = \t{ema_error:.2f}')
print(f'\tSMA = \t{sma_error:.2f}')
Lag Error
HMA = 	1.65
EMA = 	1.24
SMA = 	1.53

Whoa! The HMA actually has greater error vs the price it’s tracking than the EMA and the SMA. This seems to cut against the intent of the HMA.

This is a small sample size, however, so maybe it really does have less lag than the other indicators and we just chose a bad stock and/or time frame.

## Testing the Hull Moving Average

Let’s test this by calculating the RMSE all of the stocks in the S&P 500 over the course of a year. Additionally, we’ll do this for different values of N to see if there’s any relationship between shorter or longer term values and the error.

Below, we have a helper function to calculate these values for us.

def calcErrors(data: pd.DataFrame, N: list):
hma_error, sma_error, ema_error = [], [], []
for n in N:
hma = calcHullMA(data['Close'], n)
ema = pd.Series(calcEMA(data, n), index=data.index)
sma = data['Close'].rolling(n).mean()
hma_error.append(calcRMSE(data['Close'], hma))
ema_error.append(calcRMSE(data['Close'], ema))
sma_error.append(calcRMSE(data['Close'], sma))

return hma_error, ema_error, sma_error

The calcErrors function takes our data and a list of time periods to calculate the HMA, EMA, and SMA. From there, we calculate the RMSE for each series versus our closing price and return lists of each.

Next, we’ll loop over all the stocks in the S&P 500 and get the data for each. We’ll pass this to our error calculation function and collect the errors for each symbol.

We’re relying on the list of stocks in Wikipedia, which doesn’t necessarily correspond to how the symbols are represented in yfinance (e.g. Berkshire Hathaway has two classes of shares A's and B's, which cause issues) so we need to wrap this in a try-except statement for those edge cases. We'll still get enough that we should be able to get a decent estimate.

# Sample 10 tickers from S&P 500
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
df = table
syms = df['Symbol']
start = '2019-01-01'
end = '2020-01-01'
N = [5, 10, 15, 20, 30, 50, 100, 150, 200]
for i, s in enumerate(syms):
try:
yfObj = yf.Ticker(s)
data = yfObj.history(start=start, end=end)
except:
continue

he, ee, se = calcErrors(data, N)

if i == 0:
hma_error = np.array(he)
ema_error = np.array(ee)
sma_error = np.array(se)
else:
hma_error = np.vstack([hma_error, he])
ema_error = np.vstack([ema_error, ee])
sma_error = np.vstack([sma_error, se])

# Drop rows with missing values
hma_error = hma_error[~np.isnan(hma_error).any(axis=1)]
ema_error = ema_error[~np.isnan(ema_error).any(axis=1)]
sma_error = sma_error[~np.isnan(sma_error).any(axis=1)]

After a few minutes, we can take a look at the mean tracking error across all of our metrics and tickers below: Here we see that the HMA does track the price much better than other moving average measurements. There’s much less difference in short-time frames, but the values do start to diverge from one another fairly quickly and become more pronounced over time.

## Trading with the Hull Moving Average

We could be more rigorous by tracking the deviation of our error measurements and getting more data, however for most purposes, it does seem as if the HMA does deliver on its promise to reducing lag. How do you trade it though?

The nice thing about the HMA, is that you can use it anywhere you’d use a moving average of any variety. You could build a whole new strategy around it, or just plug it into an existing system to see if you get any boost in your results.

We make all of that as easy as possible at Raposa, where we’re building a platform to allow you to backtest your ideas in seconds, without writing a single line of code. You can check out our free demo here!