How to Trade like a Turtle without $1,000,000

A simple job ad was placed in a handful of major newspapers calling for participants to be trained as traders. Of the applicants, a total of 23 individuals were chosen to become Turtle Traders: systematic trend followers who simply followed rules and made millions in the process.

These were average, ordinary people who were taught a system in order to determine whether trading could be taught, or if it relied on some innate skill. The system is perfect for algorithmic trading – just give a computer the rules and let it run.

The turtles were given a list of markets to trade: US treasury bonds of different maturities, cocoa, coffee, cotton, oil, sugar, gold, silver, copper, gasoline, and various global currencies. They were also given $1 million to work with.

For the typical retail investor who wants to implement a Turtle trading strategy, you’re going to have a hard time working across all of those markets without that capital.

Consider a typical oil contract. It consists of 1,000 barrels, and at say, $70/barrel, you’re going to be putting up $70k for a single contract. Most retail traders (especially younger ones) aren’t going to be able to afford that, and if they can, it doesn’t leave much else for other trades.

Commodities do provide significant leverage, so you may only need to put up 30-60% of the capital for a single contract (depending on the contract and exchange rules – which can change). But that still doesn’t leave most people with smaller accounts many options.

Enter in the wide world of ETFs!

There are ETFs for just about everything under the sun which trade like stocks. These tend to be much more affordable than commodity contracts, and you don’t have to worry about rolling futures as the contract comes closer to expiration.

With this in mind, the question is – can we build a portfolio of ETFs to replicate the Turtle trading portfolio that was so successful?

Building the Turtle Portfolio

To start, we need a list of the instruments that the Turtles traded. Covel gives us a breakdown in his book:

  • 30-yr US Treasury
  • 10-yr US Treasury
  • 3-month US Treasury
  • Gold
  • Copper
  • Silver
  • Oil
  • Cotton
  • Coffee
  • Cocoa
  • Sugar
  • Gasoline
  • Swiss Franc
  • Deutschmark
  • French Franc
  • Japanese Yen
  • Canadian Dollar
  • Eurodollar
  • S&P 500 futures

The idea is to have a diverse set of uncorrelated markets, so if trends die down in one, they might be emerging elsewhere (long or short). There’s a slight problem with some of these markets: two of them no longer exist.

The Turtles plied their trade in the 1980’s which was a very different world. The USSR still existed and the EU with its single currency zone was still a ways off. So any modern analysis is going to have to drop the French Franc and Deutschmark in favor of the Euro. For our analysis though, we’re going to simply ignore these dormant currencies.

We want to see how well some modern ETFs stack up, so it’s time to go grab a list of ETFs for each of these instruments. Some are only going to have one, others will have multiple. Let’s go through each of these groups.

Bonds

There are dozens of bond ETFs on the market to choose from, but I wasn’t able to find any that are designed to track our specific US treasuries, so I grabbed a few different ones that are described as short, medium, and long-term to see what we can use.

Commodities

Many major commodities (e.g. oil, gold) have multiple ETFs that track their performance. Some of the less common markets have very young ETFs as others have been taken off the market in recent years. For example coffee – despite being one of the word’s most commonly traded commodities – has a few recently defunct ETFs and a new one that launched in 2018. There are plenty of ETFs that track coffee growers, roasters, brewers, and everyone up and down the value chain, but I could only find a single ETF for the coffee contract on US exchanges.

Currencies

The ETF choices for currencies were much slimmer than for commodities, offering only one ETF for each.

Indices

The Turtles traded S&P 500 and Eurodollar futures. The SPY is one of the most commonly traded and liquid instruments in the world, so it’s an obvious choice here. Unfortunately, I didn’t find a Eurodollar ETF that’s currently available on US markets, so I dropped it from the list.

With the list of instruments, we can go get the data. Most of the futures data isn’t available for free, so I relied on EOD and Quandl data subscriptions for those.

Trading Commodities without Futures

With our data in hand, we can see how these instruments correlate with the futures used by the Turtles.

Running correlations against the baseline contracts, we see some very strong correlations with the underlying contracts.

To build a modern, ETF-based Turtle portfolio, we can’t just take the most highly correlated gold ETF or long-term bond ETF and plug it in though. While these may be well-correlated with the underlying assets, it doesn’t follow that they maintain similar correlations with each other.

This can cause some confusion, so let’s look at a simple example. Your level of fitness and muscle mass are positively correlated. Additionally, there are many studies that show a positive effect on fitness and intelligence, so these are correlated as well. But, does that mean that muscle mass and intelligence are positively correlated? Of course not!

The same goes for our portfolio, we can’t assume because GLD is highly correlated with our gold futures and BAL is highly correlated with cotton that the correlation between GLD and BAL is the same as between gold and cotton.

What we really want, is a portfolio that has a similar overall correlation to the Turtle portfolio.

Selecting our Instruments

We have a few different portfolio selections we can make from the available instruments. If we want to match the correlations of the original turtles, we need to find out which instruments make the most sense to trade.

Let’s set up some parameters first, otherwise we could get into combinatorial explosion hell very quickly.

First, we need some metric for how closely our new portfolio matches our baseline portfolio’s correlation matrix. There are a number of distance metrics we can choose, but here we’ll go with a standard cosine similarity metric. This measures the angular distance between data, and the smaller the value, the closer they are together.

From there, we need to provide some constraints on our selections. We could trade more instruments, or drop certain markets if we’re not interested in them. But for now, we’ll constrain ourselves to trade one of each kind of market that the turtles did. Although we have multiple gold ETFs available, we’ll limit ourselves to only choosing one, and that will be it for gold. Likewise we’ll only choose one oil ETF, one sugar ETF, and so forth.

This is going to yield us 864 different portfolio combinations. We could get clever about this, but with just a few hundred possible portfolios, we can brute force our selection by trying every potential portfolio and measuring the distance between it and our baseline. If we do this, we get the following plot:

We get a few that are pretty close, but a long tail of distant portfolios. If we take the closest one, we can look at its correlation below:

This is looking at finding correlations between modern ETFs and trying to match them to the set of correlations in the futures market. We can take a cue from Ray Dalio who states that the “Holy Grail of Investing” is in diversification among uncorrelated assets.

That simple chart struck me with the same force I imagine Einstein must have felt when he discovered E=mc2: I saw that with fifteen to twenty good, uncorrelated return streams, I could dramatically reduce my risks without reducing my expected returns… I called it the “Holy Grail of Investing” because it showed the path to making a fortune.

Ray Dalio, Principles
https://static.seekingalpha.com/uploads/2019/8/7/saupload_Chart-3.png

That’s the same thing Dennis and Eckhardt were trying to do all those years ago and modern trend following traders are up to today: diversify to find uncorrelated returns.

The set of instruments we found above isn’t necessarily the most uncorrelated set of instruments to apply trend following to – just the set that’s closest in correlations to the Turtle portfolios. We can re-run our brute force optimizer again by looking to minimize the distance between our instruments and a completely uncorrelated portfolio (which sadly for us, doesn’t exist).

Swapping out these values, we get the following heat map for our least correlated set of instruments:

Interestingly, this is very close to what we had above! The only ETF we changed was BIL for SHV.

We could probably do better if we expanded our investment universe beyond the markets that the original Turtles traded in the 1980’s. There have been a plethora of new instruments and markets created in the intervening 40 years (ever heard of crypto?) which could greatly increase our diversification.

Testing the Turtle Portfolio

How well does this Turtle portfolio perform? To find out, we’d need to run a backtest using the Turtle rules and this new portfolio. Long-time Turtle traders like Jerry Parker, say that the original trend following rules should be tweaked to favor longer trends due to changes in the market over the years.

At Raposa, we’re doing the hard work of making developing and deploying an algorithmic trading strategy easy. You can check out our free demo here to learn more.

Why you need more data than you think in your Backtest

How many years does it take before you can be confident in a trading strategy?

Does one great year mean you have a tremendous strategy? Does one bad year mean you should pack it up and try something else? How soon can you tell that a system is flawed and needs changing?

These aren’t easy questions, but they’re incredibly important to any investor, whether you’re systematic or not!

While we can’t give hard and fast rules – because there aren’t any – we can outline a series of principles based on your trading and explain why we try to provide as much, high-quality data as possible in our platforms.

How significant are your results?

Although it is not without its flaws, the Sharpe Ratio remains the standard measure of risk-adjusted returns. A high Sharpe means you have good returns against your baseline with little volatility, something most traders crave. A Sharpe of 0 indicates that your strategy isn’t providing any real value, and a negative Sharpe means you’re destroying value.

With that in mind, Rob Carver laid out an experiment whereby he asked how long would he have to run a daily strategy with a given Sharpe Ratio to determine whether or not it is profitable or just noise?

His results were surprising.

Results taken from Systematic Trading.

Lower Sharpe Ratio strategies took decades to distinguish from noise! Consider the implications.

Many retail traders don’t run proper backtests – even those trying to be systematic – so they jump into a strategy based on some vague ideas of what might/might not work from a guru, or message board, YouTube video, or who knows what. They start trading and maybe they’re doing well enough with a Sharpe Ratio of 0.4 after their first year. Does that mean they have an edge? Well, maybe! But they’re going to need to continue that for 32 more years before they can be sure!

The other thing to note is that strategies with higher Sharpe Ratios are easier to distinguish from noise. They stand out more, so if you have a system that turns in some excellent years (e.g. Sharpe > 1) then it’s likely that you’ve got something great.

If you’re trading daily signals, it seems that you’re going to want at least 20 years of history to properly test a strategy.

Test in multiple regimes

Never confuse genius with luck and a bull market

John Bogle

Markets go through bull and bear markets, moving up, down, and sideways; periods of high volatility and low volatility. If you build a strategy and only test it on one type of regime, then you’re setting yourself up for some nasty surprises when the market inevitably turns!

Extending your backtest and getting more historical data is key to ensuring that your strategy can hold up in these different environments.

What if you are trading something that doesn’t have a long history?

In a case like this, you have a few options:

  1. Find a comparable proxy that does have a long history to see how your model performs. Are there correlated instruments such as equities in the same industry (e.g. energy companies) or commodities with similar drivers (e.g. gold and silver)?
  2. Generate simulated data to test your ideas. This requires you to generalize from the statistics of the data you’re working with to simulate additional time series data to see how your strategy performs. An advantage of this approach is that you can tweak some of those statistics or create different trends and scenarios to build broader tests. Caution needs to be exercised in this approach because you may be fitting a system on something that has no link to reality.
  3. Don’t trade it.

Use High-Quality Data

For most of our tutorials, we rely on the yfinance Python API because it’s free, easy to use, and generally has reliable data. Except when it doesn’t.

In fact, a lot of free data sources are mostly reliable, but occasionally, you’ll run into big problems.

Take the first 5 years of this single-instrument backtest below:

The high-level statistics looked tremendous, too good to be true in fact. Plotting the equity curve shows why.

Early in the backtest, there’s a major jump in returns as a leveraged short position makes an absolute killing. Looking at the data, we see a (fictional) overnight drop of 94%.

Simply adjusting the starting point to begin after the anomaly shows that the strategy doesn’t add much above the baseline.

Data quality makes a big difference, but even paid sources aren’t perfect – although they tend to be much better.

To increase your confidence in your data quality, you could use multiple, independent sources to check for differences. If you have three sources, if two sources agree on a given price and one differs, take the value from the two. If all are different, then average the three and rely on a single source to fill in missing data if 2 out of the 3 are missing a value.

It won’t ensure your data is flawless, but will greatly reduce the odds of a data error being introduced.

Optimization

Best practice dictates that only some of your data be used for fitting your parameters (e.g. tweaking lookback periods, stop loss levels, etc.) and the remaining be used for testing. The first portion is called in-sample data while the latter is out-of-sample.

The idea is that the out-of-sample data provides you a chance to see what your system is going to do on new data that it has not been calibrated to trade. The stats should be worse, but not significantly (unless you over-fit on your in-sample data). This is designed to give you a better estimate for future performance.

Unfortunately, this requires even more data to complete effectively.

Frequently you’ll see recommendations for a 70-80% of your data being used as in-sample data with the remaining 20-30% as out of sample data.

Another way to deal with this is by using cross-validation techniques like walk forward optimization. This allows you to optimize and test on subsets of your data and choose the best.

Long Data Bias

Can we ever have enough data?

On the extreme end, we have funds that go back to the 1880’s to better understand their strategies, or Renaissance Technologies which collected price data from the 1700’s.

How much is enough is going to depend on your goals and whether you’re really getting value from adding 1959 to your time series that already goes back to 1960. There is a law of diminishing returns that will eventually kick in for most investors.

Regardless, data is our raw material and we frequently need more of this resource than we think.

Your Free Data is Costing You Money

Garbage in. Garbage out. This old adage holds for all areas of decision sciences, including backtesting your investment strategies.

Years of working with financial data — and directly in the data industry — has revealed deep issues with many data providers, and especially the freely available sources. Errors are everywhere, which could make a great strategy look terrible, or worse, a losing strategy look highly profitable.

The Financial Data Pipeline

If you open up your favorite trading platform or brokerage account, you’re confronted with a series of quotes composed of red or green numbers updating by the second. For individual stocks and securities, they represent to price the most recent transaction was settled at. For indices (such as shown below) they are the aggregation of the most recent transactions of all of the securities that make up the index.

These transactions are recorded by the exchanges and sold to data providers, who in turn offer the data along with their APIs or software packages to traders, institutions, and others. There are a lot of free data providers out there as well, which is often where most algorithmic traders start.

Pitfalls of Free Data Sources

Free data is a great place to start — we use free data sources in our tutorials because it’s easy and accessible for people — but we would never trust our hard-earned cash to a strategy operating on an algorithm that relies on free data. There are a few reasons for this.

Many free data sources have limited histories. For a good algorithmic approach, we want as much data as possible, which means going back in time as far as possible so that we’re able to test our approaches against a wide-range of markets. 5 or 10 years of data just doesn’t cut it.

Free data sources may become obsolete or move to a premium model. If this happens, your algorithm is suddenly going to be cut off which could lead to missed trades. Most professional sources are loath to change their systems because their customers depend on consistent and reliable data feeds to build their businesses (this can be seen when sampling professional data systems and finding a lot of UIs that were clearly built for Windows 95…but they still work!).

Data inaccuracies are frequent. It’s hard to keep up with thousands of companies and their corporate changes, so stock splits, dividend payments, and the like which need to get propagated into historical data frequently get passed over. Additionally, rounding errors can compound the farther back in time you look.

Stock tickers often get re-purposed after de-listing and many free sources either don’t keep records of these de-listed stocks or only allow look-ups via the ticker. If this isn’t properly accounted for, then you could introduce survivorship bias into your backtests by only testing strategies against companies that have survived over the years. This has the effect of inflating your results and hiding risk.

Free-data stalwarts like Yahoo! Finance have gone through all of these issues, restricting data by changing business models; having APIs suddenly break with new updates; miscalculating dividends, splits, and the like; rounding payouts which leads to errors as data gets propagated into the past; and dropping de-listed stocks causing survivorship bias in backtests.

Professional Data Sources

This isn’t an ad for buying data from a vendor — triangulating multiple free data sources and making regular updates can help fix these issues, but that’s a lot of work that may be better spent doing research and running tests. It’s better to start with good, high quality data and build from there rather than spending heaps of time chasing down discrepancies, building scrapers, and patching APIs.

Let us handle that for you at Raposa Technologies where we’re building a platform to make quantitative investing easily available. We’ve done the hard work of vetting our data and vendors, giving you access to professional backtesting capabilities to build your own strategies that you can be confident in.