How to Backtest your First Trading Strategy in Python

Vectorized Simple Moving Average Strategy to beat the S&P 500

Anyone who follows the financial media has likely seen headlines like: The S&P 500 just formed a ‘Golden Cross,’ a bullish chart pattern with a solid track record or the A bearish ‘death cross’ has appeared in the Dow’s chart. These headlines refer to a simple moving average (SMA) indicator we can use to trade.

TL;DR

We construct a simple moving average strategy in Python and backtest the results.

Cross of Gold

Simple Moving Average (SMA) strategies are the bread and butter of algorithmic trading. At their most basic level, traders look at a short term moving price average and a longer term average (say, the 50-day and 200-day moving averages) and buy when the short term value is greater than the long term value. They’ll often sell or go short when that trend reverses. Amazingly, something as simple as this is able to garner a profitable return!

In its most basic application, a trader will select a single contract or security to track over time. They then calculate their averages over a given number of time periods — these can be days, weeks, months, minutes, seconds, whatever interval suits your trading speed and style — and make the comparison. We’re going to implement this on daily price charts and show how to use it to build a basic trading strategy in Python.

Let’s start by importing a few packages.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf

Above, we have some Python basics like numpy, pandas, and matplotlib. Additionally, we import yfinance, which is a package that gives us access to Yahoo! Finance data. The Yahoo! Finance API has changed a lot over the years, so a number of different packages have been deprecated and new ones have emerged to be able to offer consistent access to this free data source. We'll use this to pull our data and set up our strategy.

To do so, select a ticker symbol and pass it to the Ticker method. I'm going to start at the beginning of the alphabet with 'A', which is Agilent Technologies, a healthcare company that has a long history for backtesting

ticker = "A"
yfObj = yf.Ticker(ticker)
data = yfObj.history(start="2000-01-01", end="2020-12-31")
name = yfObj.info['shortName']
plt.figure(figsize=(10, 5))
plt.plot(data['Close'])
plt.title(f'Price Chart for {name}')
plt.grid()
plt.show()

Looking at this price chart, we can see that Agilent had a huge run up in price during the Dot Com Bubble, then crashed hard and only reached its 2000–2001 peak 20 years later. A longterm, buy-and-hold investor who got in before the peak in early 2000 still would have waited over 15 years to recoup their investment (Agilent paid a regular, quarterly dividend starting in 2012, which would have changed this by a few quarters).

Let’s see if we can do better with a SMA strategy using standard, 50 and 200-day moving averages.

def SMABacktest(ticker, short_term_sma, long_term_sma, 
    shorts=False, start_date='2000-01-01', end_date='2020-12-31'):
    yfObj = yf.Ticker(ticker)
    data = yfObj.history(start=start_date, end=end_date)
    
    data['SMA1'] = data['Close'].rolling(short_term_sma).mean()
    data['SMA2'] = data['Close'].rolling(long_term_sma).mean()
    if shorts:
        data['position'] = np.where(
            data['SMA1'] > data['SMA2'], 1, -1)
    else:
        data['position'] = np.where(
            data['SMA1'] > data['SMA2'], 1, 0)
    
    # Calculate returns
    data['returns'] = data['Close'] / data['Close'].shift(1)
    data['log_returns'] = np.log(data['returns'])
    data['strat_returns'] = data['position'].shift(1) * \    
        data['returns']
    data['strat_log_returns'] = data['position'].shift(1) * \
        data['log_returns']
    data['cum_returns'] = np.exp(data['log_returns'].cumsum())
    data['strat_cum_returns'] = np.exp(
        data['strat_log_returns'].cumsum())
    data['peak'] = data['cum_returns'].cummax()
    data['strat_peak'] = data['strat_cum_returns'].cummax()
    
    return data

In the SMABacktest function above, we supply the stock ticker, get the data from Yahoo! Finance and then calculate the moving averages for each of our SMA indicators. We included a shorts option that allows the algorithm to go short if the SMA reverses, otherwise, it just exits the position and waits for the faster SMA to poke above the slower SMA before going long again.

We can plot the strategy as well as the cumulative returns below.

short_term_sma = 50
long_term_sma = 200
data = SMABacktest(ticker, short_term_sma, long_term_sma)
fig, ax = plt.subplots(2, figsize=(10, 5), sharex=True)
ax[0].plot(data['Close'], label=ticker)
ax[0].plot(data['SMA1'], label=f"{short_term_sma}-Day SMA")
ax[0].plot(data['SMA2'], label=f"{long_term_sma}-Day SMA")
ax[0].set_ylabel('Price ($)')
ax[0].set_title(f'{ticker} Price with {short_term_sma}-Day SMA and {long_term_sma}-Day SMA')
ax[0].legend(bbox_to_anchor=[1, 0.75])
ax[0].grid()
ax[1].plot((data['strat_cum_returns'] - 1) * 100, label='SMA Strategy')
ax[1].plot((data['cum_returns'] - 1) * 100, label='Buy and Hold Strategy')
ax[1].set_ylabel('Returns (%)')
ax[1].set_xlabel('Date')
ax[1].set_title(f'Cumulative Returns for SMA and Buy and Hold Strategy')
ax[1].legend(bbox_to_anchor=[1.25, 0.75])
ax[1].grid()
plt.show()

Let’s also put another function together so that we can evaluate the performance of our model using some standard metrics.

def getStratStats(data, risk_free_rate=0.02):
    sma_strat, buy_hold_strat = {}, {}
    
    # Total Returns
    sma_strat['tot_returns'] = np.exp(data['strat_log_returns'].sum()) - 1
    buy_hold_strat['tot_returns'] = np.exp(data['log_returns'].sum()) - 1
    
    # Mean Annual Returns
    sma_strat['annual_returns'] = np.exp(data['strat_log_returns'].mean() * 252) - 1
    buy_hold_strat['annual_returns'] = np.exp(data['log_returns'].mean() * 252) - 1
    
    # Annual Volatility
    sma_strat['annual_volatility'] = data['strat_log_returns'].std() * np.sqrt(252)
    buy_hold_strat['annual_volatility'] = data['log_returns'].std() * np.sqrt(252)
    
    # Sharpe Ratio
    sma_strat['sharpe_ratio'] = (sma_strat['annual_returns'] - risk_free_rate) \
        / sma_strat['annual_volatility']
    buy_hold_strat['sharpe_ratio'] = (
        buy_hold_strat['annual_returns'] - risk_free_rate) \
        / buy_hold_strat['annual_volatility']
    
    # Max Drawdown
    _strat_dd = data['strat_peak'] - data['strat_cum_returns']
    _buy_hold_dd = data['peak'] - data['cum_returns']
    sma_strat['max_drawdown'] = _strat_dd.max()
    buy_hold_strat['max_drawdown'] = _buy_hold_dd.max()
    
    # Max Drawdown Duration
    strat_dd = _strat_dd[_strat_dd==0]
    strat_dd_diff = strat_dd.index[1:] - strat_dd.index[:-1]
    strat_dd_days = strat_dd_diff.map(lambda x: x.days).values
    strat_dd_days = np.hstack([strat_dd_days, 
        (_strat_dd.index[-1] - strat_dd.index[-1]).days])
    
    buy_hold_dd = _buy_hold_dd[_buy_hold_dd==0]
    buy_hold_diff = buy_hold_dd.index[1:] - buy_hold_dd.index[:-1]
    buy_hold_days = buy_hold_diff.map(lambda x: x.days).values
    buy_hold_days = np.hstack([buy_hold_days,
        (_buy_hold_dd.index[-1] - buy_hold_dd.index[-1]).days])
    sma_strat['max_drawdown_duration'] = strat_dd_days.max()
    buy_hold_strat['max_drawdown_duration'] = buy_hold_days.max()
    
    stats_dict = {'strat_stats': sma_strat,
                  'base_stats': buy_hold_strat}
    
    return stats_dict

The getStratStats function above returns a number of important statistics to evaluate the strategies performance. It also compares the results to the buy-and-hold strategy as a baseline comparison to show how some simple trading can yield additional returns.

The stats we show above are fairly standard. We work off of the log returns and convert them to simple returns for easier interpretation. We also show drawdown stats, which are important when selecting a strategy that suits your personal temperament. This metric is a bit more involved to calculate than the others. It requires us to get the cumulative returns, and compare that with the peak returns up to that point in time. A new peak will equal the cumulative returns up to that point, so we can pull out new all-time highs by finding where the difference between the peak and cumulative returns are 0. To get the duration, we then look at the datetime indices of the new peaks and extract the days between them. It’s also important to get the time since the last peak in case the strategy is currently in a long drawdown.

Let’s see how this strategy performed.

stats_dict = getStratStats(data, risk_free_rate=0.02)
pd.DataFrame(stats_dict).round(3)

We see that the annualized returns are a healthy 6.7% with the SMA strategy versus 4.7% with buy and hold (again, ignoring dividends). The volatility for the SMA strategy is significantly lower than buy and hold, which can be seen in the plot above where the SMA strategy exits all positions and flatlines, waiting for a new golden cross to appear. Both strategies, however, suffer from long, drawdowns spanning multiple years. While the results look solid, would you be able to stay disciplined and stick while waiting over 2,400 days to reach a new high?

What about an Index Fund?

Warren Buffett frequently advises the average investor to just buy an index fund and forget about it. How does this strategy compare with buying the SPY in 2000? Well, that’s easy enough to check!

spyObj = yf.Ticker('SPY')
spy_data = spyObj.history(start='2000-01-01', end='2020-12-31')
spy_ratio = spy_data['Close'][-1] / spy_data['Close'][0]
spy_ret = spy_ratio - 1
years = (spy_data.index[-1] - spy_data.index[0]).days / 365
spy_ann_ret = (spy_ratio) ** (1 / years) - 1
print(f'SPY Total Returns:\t{spy_ret*100:.2f}%' + 
      f'\nSPY Annual Returns:\t{spy_ann_ret*100:.2f}%')
      
print(f'SMA Total Returns:\t{stats_dict["strat_stats"]["tot_returns"]*100:.2f}%' +
      f'\nSMA Annual Returns:\t{stats_dict["strat_stats"]["annual_returns"]*100:.2f}%')

SPY Total Returns:	279.02%
SPY Annual Returns:	6.55%
SMA Total Returns:	290.69%
SMA Annual Returns:	6.72%

Our golden cross strategy on a stock with a decade-long bear market outperforms buying and holding the S&P 500. The big reason, our model avoids the largest losses by sitting on the sideline and waiting for a signal.

We can run this strategy on the SPY itself too to see if we can outperform the tried-and-true long only index fund strategy. Additionally, we can add shorts into our strategy to further boos our returns.

sma_spy_data = SMABacktest('spy', short_term_sma, 
    long_term_sma, shorts=False)
sma_spy_data_shorts = SMABacktest('spy', short_term_sma,
    long_term_sma, shorts=True)
fig, ax = plt.subplots(figsize=(10, 5), sharex=True)
ax.plot((sma_spy_data['strat_cum_returns'] - 1) * 100, 
    label='SMA SPY')
ax.plot((sma_spy_data_shorts['strat_cum_returns'] - 1) * 100, 
    label='SMA SPY with Shorts')
ax.plot((data['strat_cum_returns'] - 1) * 100, 
    label='SMA A')
ax.plot((sma_spy_data['cum_returns'] - 1) * 100, 
    label='Buy and Hold SPY')
ax.set_ylabel('Returns (%)')
ax.set_xlabel('Date')
ax.set_title(f'Cumulative Returns for SMA and Buy and Hold Strategy')
ax.legend(bbox_to_anchor=[1, 0.5])
ax.grid()
plt.show()

spy_stats = getStratStats(sma_spy_data, risk_free_rate=0.02)
spy_stats_shorts = getStratStats(sma_spy_data_shorts, risk_free_rate=0.02)
df0 = pd.DataFrame(stats_dict)
df0.columns = ['A SMA', 'A Buy and Hold']
df1 = pd.DataFrame(spy_stats)
df1.columns = ['SPY SMA', 'base_strat']
df2 = pd.DataFrame(spy_stats_shorts)
df2.columns = ['SPY SMA with Shorts', 'SPY Buy and Hold']
df = pd.concat([df0, df1, df2], axis=1)
df.drop('base_strat', axis=1, inplace=True)
df.round(3)

Looking at the values above, we can see that overlaying a SMA strategy helps boost both absolute returns and our risk adjusted returns above and beyond what we could get from most baselines.

There are a lot of ways we could go to develop this further. First, we just chose 50 and 200 day time periods for our moving averages. Those likely aren’t the best, leaving work on the table to optimize these parameters. Second, there’s no risk control in this strategy. This could be introduced by adding other markets, instruments, or volatility overlays to adjust position size to reduce the pain of losses and wind up with higher returns. There are no taxes or transaction costs taken into account here either. We also didn’t take into account dividends because we just ran this on standard, daily OHLC data rather than total return data or explicitly adjusting for dividend payments.

You could buy the data, computers, and code all of this yourself, or you could join us at Raposa and get access to professional backtests and signals to generate your own strategies without a single line of code. Enter your email address to keep up to date on our latest developments as we roll out new features to democratize quantitative finance!