Why you need more data than you think in your Backtest

How many years does it take before you can be confident in a trading strategy?

Does one great year mean you have a tremendous strategy? Does one bad year mean you should pack it up and try something else? How soon can you tell that a system is flawed and needs changing?

These aren’t easy questions, but they’re incredibly important to any investor, whether you’re systematic or not!

While we can’t give hard and fast rules – because there aren’t any – we can outline a series of principles based on your trading and explain why we try to provide as much, high-quality data as possible in our platforms.

How significant are your results?

Although it is not without its flaws, the Sharpe Ratio remains the standard measure of risk-adjusted returns. A high Sharpe means you have good returns against your baseline with little volatility, something most traders crave. A Sharpe of 0 indicates that your strategy isn’t providing any real value, and a negative Sharpe means you’re destroying value.

With that in mind, Rob Carver laid out an experiment whereby he asked how long would he have to run a daily strategy with a given Sharpe Ratio to determine whether or not it is profitable or just noise?

His results were surprising.

Results taken from Systematic Trading.

Lower Sharpe Ratio strategies took decades to distinguish from noise! Consider the implications.

Many retail traders don’t run proper backtests – even those trying to be systematic – so they jump into a strategy based on some vague ideas of what might/might not work from a guru, or message board, YouTube video, or who knows what. They start trading and maybe they’re doing well enough with a Sharpe Ratio of 0.4 after their first year. Does that mean they have an edge? Well, maybe! But they’re going to need to continue that for 32 more years before they can be sure!

The other thing to note is that strategies with higher Sharpe Ratios are easier to distinguish from noise. They stand out more, so if you have a system that turns in some excellent years (e.g. Sharpe > 1) then it’s likely that you’ve got something great.

If you’re trading daily signals, it seems that you’re going to want at least 20 years of history to properly test a strategy.

Test in multiple regimes

Never confuse genius with luck and a bull market

John Bogle

Markets go through bull and bear markets, moving up, down, and sideways; periods of high volatility and low volatility. If you build a strategy and only test it on one type of regime, then you’re setting yourself up for some nasty surprises when the market inevitably turns!

Extending your backtest and getting more historical data is key to ensuring that your strategy can hold up in these different environments.

What if you are trading something that doesn’t have a long history?

In a case like this, you have a few options:

  1. Find a comparable proxy that does have a long history to see how your model performs. Are there correlated instruments such as equities in the same industry (e.g. energy companies) or commodities with similar drivers (e.g. gold and silver)?
  2. Generate simulated data to test your ideas. This requires you to generalize from the statistics of the data you’re working with to simulate additional time series data to see how your strategy performs. An advantage of this approach is that you can tweak some of those statistics or create different trends and scenarios to build broader tests. Caution needs to be exercised in this approach because you may be fitting a system on something that has no link to reality.
  3. Don’t trade it.

Use High-Quality Data

For most of our tutorials, we rely on the yfinance Python API because it’s free, easy to use, and generally has reliable data. Except when it doesn’t.

In fact, a lot of free data sources are mostly reliable, but occasionally, you’ll run into big problems.

Take the first 5 years of this single-instrument backtest below:

The high-level statistics looked tremendous, too good to be true in fact. Plotting the equity curve shows why.

Early in the backtest, there’s a major jump in returns as a leveraged short position makes an absolute killing. Looking at the data, we see a (fictional) overnight drop of 94%.

Simply adjusting the starting point to begin after the anomaly shows that the strategy doesn’t add much above the baseline.

Data quality makes a big difference, but even paid sources aren’t perfect – although they tend to be much better.

To increase your confidence in your data quality, you could use multiple, independent sources to check for differences. If you have three sources, if two sources agree on a given price and one differs, take the value from the two. If all are different, then average the three and rely on a single source to fill in missing data if 2 out of the 3 are missing a value.

It won’t ensure your data is flawless, but will greatly reduce the odds of a data error being introduced.

Optimization

Best practice dictates that only some of your data be used for fitting your parameters (e.g. tweaking lookback periods, stop loss levels, etc.) and the remaining be used for testing. The first portion is called in-sample data while the latter is out-of-sample.

The idea is that the out-of-sample data provides you a chance to see what your system is going to do on new data that it has not been calibrated to trade. The stats should be worse, but not significantly (unless you over-fit on your in-sample data). This is designed to give you a better estimate for future performance.

Unfortunately, this requires even more data to complete effectively.

Frequently you’ll see recommendations for a 70-80% of your data being used as in-sample data with the remaining 20-30% as out of sample data.

Another way to deal with this is by using cross-validation techniques like walk forward optimization. This allows you to optimize and test on subsets of your data and choose the best.

Long Data Bias

Can we ever have enough data?

On the extreme end, we have funds that go back to the 1880’s to better understand their strategies, or Renaissance Technologies which collected price data from the 1700’s.

How much is enough is going to depend on your goals and whether you’re really getting value from adding 1959 to your time series that already goes back to 1960. There is a law of diminishing returns that will eventually kick in for most investors.

Regardless, data is our raw material and we frequently need more of this resource than we think.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>