Optimize Your Trading Strategy With Python And The Kelly Criterion

Retail traders almost always have small trading accounts. To get the returns they're after, traders frequently take on leverage - often times imprudent amounts in highly levered FOREX accounts that can be levered 50-100x!

Retail equity brokers are a bit more conservative with maximum leverage ratios of 2-3x.

Leverage is a dangerous thing to play with - and we don't recommend inexperienced traders using it at all. It's simply too risky if you have a significant percentage of your net worth in your account.

Regardless, position sizing is crucial to get the returns you're after, and leverage can be very valuable if you know what you're doing. You just have to be too careful, too big and you're likely to blow up your account!

So how do traders get it right?

TL;DR

We walk through allocation with a levered long/short Kelly Criterion and benchmark an optimal Kelly strategy versus approximations and naive allocation methods using leverage.

Position Sizing and Your Profitability

Improper position sizing is one of the pitfalls for retail traders. How many people have you heard of who have put everything they own into a position, or haven't even thought about how much they've allocated to a trade?

Maybe you were even there once.

While simple rules can be very effective, some traders want to find the "optimal" amount of leverage to apply to a trade. There are a number of different ways to do this (eventually I'll get around to a huge overview of all the portfolio optimization methods out there), but the Kelly Criterion remains popular for its effectiveness.

This method was originally developed to filter out noise in telephone lines in the 1950's, but the math behind it can be applied to gambling and investing. The primary idea is that we have a probability distribution of price events and we need to determine what fraction of our portfolio to bet.

In the case of stocks, we use the standard deviation (𝜎), mean returns (𝜇), and our risk-free rate to calculate the optimal fraction we put into an asset with anything else going into our risk-free asset.

$$f^* = \frac{\mu - r}{\sigma^2}$$

That pretty little * on the f indicates this is the optimal solution - can't do any better under these assumptions and constraints. And it is provably optimal, but this only works for a two-asset world, e.g. S&P and cash.

In the real world, we're more often concerned with how to allocate our capital across multiple assets simultaneously. That will require some new mathematical tools.

We want to show two ways to calculate this, a quick and dirty way that is commonly used and a more robust method using quadratic programming that we can use to optimize our portfolio.

Unconstrained Kelly Optimization

If we have multiple, perfectly uncorrelated assets (i.e. correlation = 0), then we could just use the same formula above for each asset and be done with it. In reality, we don't really have anything that is completely uncorrelated with everything else, so we'll need to make a modification.

Instead of using the variance (𝜎^2) of each asset, we need the covariance matrix. This is a matrix that gives the variance of the individual assets in the diagonal and the covariance of each of the assets throughout the rest of the matrix (hopefully you see that if all the covariance values are 0, then that just reduces to the formula above). We'll use a Σ to denote this matrix and write it in vector form like:

$$F^* = \frac{\bf{\mu} - r}{\Sigma}$$

Or, to make it more consistent with our optimization models, let's also right it in the slightly clunkier but equivalent summation notation.

$$f^*_i = \sum_{i=1}^n \sum_{j=1}^m \frac{\mu_i - r}{\Sigma_{ij}}$$

This gives us our unconstrained solution, meaning we don't have any kind of limits on our positions.

We can see how this plays out with a simple example where we have two inversely correlated prices that we simulate with quarter of a cycle from some sine waves.

If you want to test it for yourself, just fire up Python and follow along!

import numpy as np
import pandas as pd

# Optimize with two out of sync sin waves
t = np.linspace(0, np.pi/2, 100)
A = np.sin(t) + 1
B = np.sin(t - np.pi) + 1

df = pd.DataFrame({'A': A,
                   'B': B})

returns = df.pct_change()
mu = returns.mean()
var = returns.var()
cov = returns.cov()
varcov = cov.values.copy()
F = np.dot(mu, np.linalg.inv(varcov))
F

array([617.03128196, -14.85627131])

Calculating the results according to our formula yields 617x leverage on the one increasing and a short position with 15x leverage on the other!

This is irresponsibly levered!

To bring this back down to something reasonable, we can apply some simple heuristics such as having a maximum leverage of say, 3 on the whole portfolio, then scaling the values accordingly.

Mathematically, we'd write it as:

$$\hat{f}^*_i = \lambda_{max} sgn(f^*_i) \frac{ \mid f^*_i \mid}{\sum_{i=1}^n \mid f^*_i \mid}$$ $$\textrm{where } \lambda_{max} \textrm{ is our maximum leverage ratio and } sgn(f^*_i) \textrm{ denotes the sign.}$$

Let's add this maximum leverage ratio to the values we calculated above to get our constrained results.

lam = 3

F_lam = lam * np.abs(F) / np.abs(F).sum() * np.sign(F)

print(f"Unconstrained:\t{F.round(2)}")
print(f"Constrained:\t{F_lam.round(2)}")

Unconstrained:	[617.03 -14.86]
Constrained:	        [ 2.93 -0.07]

This heuristic brings our leverage down significantly such that only a fraction of our portfolio is short (7%) while we take on leverage to go long.

This is the quick and dirty way to get your optimal leverage with the Kelly Criterion, but there's a better, more precise way to go about this. Instead of scaling down by an arbitrary constraint factor, we can optimize under a leverage constraint. This will optimize the whole portfolio for us by solving a quadratic program.

Quadratic Programming for Portfolio Optimization

The Kelly Criterion, as applied to investments, was derived to maximize the growth rate of our portfolio (𝑔) given the assets we have and their statistics. One day, I'll go through the details of how to derive all of these equations - but that day is not today. Instead, I'm just going to throw the math at you up front and explain it later (also, this article does cover a more general intro to math programming and uses the Kelly Criterion in cases where we don't allow leverage or shorts).

$$\max_f g = \sum_{i=1}^n f_i ( \mu_i - r) - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^m f_i f_j \hat{\Sigma}_{ij} \quad (1)$$ $$\textrm{s.t.} \; \sum_i x_i \leq \lambda_{max} \quad (2)$$ $$x_i \geq f_i \quad (3)$$ $$x_i \geq - f_i \quad (4)$$ $$f_i, x_i \in \mathbb{R}$$

Here, we have four equations that define our model with two sets of variables, f and x. Equation 1 is our objective function. This says we're trying to find the values of f (the amount we allocate to each asset) to maximize 𝑔 (the equation we solved in the heuristic case above is the derivative of this equation). We're doing this "subject to" (that's the s.t.) constraints that are given in equations 2-4. These constraints just say that we have to keep our total leverage below the lambda max threshold (Eq 2).*

*You'll notice in the heuristic above we had an absolute value in our equation so that our total allocation wouldn't exceed our lambda max, to avoid longs/shorts cancelling out. Here, we don't have the absolute value because we reformulated the constraints using the auxiliary variable x and writing equations 3 and 4. I'm too lazy to get into the explanation of why this works and I'm betting you don't really care why or want to read it. If I'm wrong about that second part, you can find a discussion on convex optimization here that should satisfy you. If that doesn't work,go read this book.

Let's turn to implementing this model in Python.

First, we need to import Pyomo (or install it if you haven't already) so we can build our model.

from pyomo.environ import *

Next, we'll build our model. This will be a function that will take our returns and covariance matrix as inputs, then build the algebraic formulation that we see above with Pyomo.

def buildKCOptModel(returns: np.array, varcov: np.matrix, 
                    rfr: float = 0, lam_max: float = 3):

  m = ConcreteModel()

  # Indices
  m.i = RangeSet(0, returns.shape[0] - 1)

  # Decision variables
  m.f = Var(m.i, domain=Reals)
  m.x = Var(m.i, domain=Reals)

  # Parameters
  m.mu = Param(m.i, 
               initialize={i: m for i, m in zip(m.i, returns)})
  m.sigma = Param(m.i, m.i, 
                  initialize={(i, j): varcov[i, j] 
                              for i in m.i 
                              for j in m.i})
  m.lam_max = lam_max

  # Constraints
  @m.Constraint()
  def maxLeverageConstraint(m):
    return sum(m.x[i] for i in m.i) <= m.lam_max

  @m.Constraint(m.i)
  def posFraction(m, i):
    return m.x[i] - m.f[i] >= 0

  @m.Constraint(m.i)
  def negFraction(m, i):
    return m.x[i] + m.f[i] >= 0

  Objective
  @m.Objective(sense=maximize)
  def objective(m):
    return (rfr + sum(m.f[i] * (m.mu[i] - rfr) for i in m.i) - \
            sum(
                sum(m.f[i] * m.sigma[i, j] * m.f[j] for j in m.i)
            for i in m.i) / 2)

  return m

Let's build out our model by passing the data from our simple, sine wave example.

model = buildKCOptModel(mu, varcov)

Our model is ready to solve!

There's one problem though, this isn't a typical function where you run a method like model.fit() and out pop the results. This model requires a solver that will run a special algorithm to find the values of f that will maximize 𝑔 according to the constraints above.

Thankfully, we have ready access to open-source solvers like this. The one we'll be relying on is called IPOpt because it runs the Interior Point Optimization method to find the optimal values (again, read this book for details). You can get installation instructions here.

If you're a Linux lover like me (or on Google Colab) you can install it with the following:

# Install IPOPT
!wget -N -q "https://ampl.com/dl/open/ipopt/ipopt-linux64.zip"
!unzip -o -q ipopt-linux64

Once that's complete, we can solve our model by running Pyomo's SolverFactory function and telling it to go get IPOpt and apply it to our model.

Let's do that and take a look at the value of the objective function and the values of our fractions f.

results = SolverFactory('ipopt').solve(model)
print(f"g = {model.objective.expr():.3f}")
print(f"Fractions = {[np.round(model.f[i].value, 3) for i in model.i]}")

g = 0.163
Fractions = [0.0, -3.0]

These fractions are much, much different from our quick and dirty Kelly Criterion heuristic!

In the other case, we were leveraged long with a smaller allocation to our short position. In this case, we have no long position and we've maxed out our leverage to go short.

Why the difference?

The heuristic method started by calculating the unconstrained Kelly Criterion allocations, then scaled those down and applied them to your portfolio. It's making the unrealistic assumption that you have unlimited leverage and all of the portfolio interactions are nice and linear so you'll still get the same results - just at a smaller scale.

We can look at this by calculating plugging the heuristic values back into our objective function and comparing the results. When we do this, we get 2.7% vs. 16.3% for the solution of the quadratic programming Kelly Criterion model. This means that we would anticipate a 2.7% annual return if we follow the heuristic models' allocation suggestions whereas we'd expect a 16.3% annual return by following the output of the QP model.

That significant difference comes from the fact that we're taking our actual constraints (namely maximum leverage) into account up front and not as a post hoc adjustment factor.

It pays off to do things the right way.

What Works in Practice?

The math is great and all, but we want to know how to test and apply these. For that, we have a class we'll call OptimalAllocation which will take a list of tickers, a start and ending date, and maximum leverage. It will then calculate the optimal allocation using the methods discussed above. For a benchmark, we'll also implement a naive, equal allocation method which will simply divide our levered capital into equal fractions and maintain a constant allocation across time with that.

We'll pull data from yfinance for ease of use.

Here's the class:

import yfinance as yf

class OptimalAllocation:

  def __init__(self, tickers: list, max_leverage: float=3, lookback: int=252,
               rfr: float=0, start: str="2000-01-01", end: str="2021-12-31",
               rebalance_freq: int=1):
    self.tickers = tickers
    self.max_leverage = max_leverage
    self.lookback = lookback
    self.start = start
    self.end = end
    self.rfr = rfr
    self.rebalance_freq = rebalance_freq

    self.data = self._getData()
    self._calcStats()

  def _getData(self):
    yfObj = yf.Tickers(self.tickers)
    data = yfObj.history(start=self.start, end=self.end)
    data.drop(["High", "Low", "Open", "Volume", "Stock Splits", "Dividends"],
              axis=1, inplace=True)
    data.columns = data.columns.swaplevel()
    data.dropna(inplace=True)
    return data

  def _calcStats(self):
    # Calc returns
    returns = self.data.loc[:, (slice(None), "Close")] / \
      self.data.loc[:, (slice(None), "Close")].shift(1)
    returns = returns.rename(columns={"Close": "returns"})
    
    means = returns.rolling(self.lookback).mean().rename(
        columns={"returns": "mean"})

    # Calculate covariance matrices and transform to 3D array
    n = returns.shape[1]
    self.cov = returns.droplevel(1, axis=1).rolling(
        self.lookback).cov().values.reshape(-1, n, n)
    self.data = pd.concat([self.data, returns, means], axis=1)
    
  def calcKCUnconstrainedAllocation(self):
    '''
    Calculates the allocation fractions for the unconstrained 
    Kelly Criterion case.
    '''
    fracs = np.zeros((len(self.data), len(self.tickers)))
    fracs[:] = np.nan

    for i, (ts, row) in enumerate(self.data.iterrows()):
      if i < self.lookback:
        continue

      means = row.loc[(slice(None)), "mean"].values
    
      F = np.dot(means, np.linalg.inv(self.cov[i]))
      fracs[i] = F

    df_fracs = pd.DataFrame(fracs, index=self.data.index)
    midx = pd.MultiIndex.from_arrays(
        [self.tickers, len(self.tickers)*['unconstrained_fracs']])
    df_fracs.columns = midx
    return df_fracs

  def calcKCHeuristicAllocation(self, kelly_level: float=1):
    '''
    Calculates the allocation fractions using a simple max leverage heuristic
    for the Kelly Criterion.

    kelly_level: allows setting to full kelly (1) half-kelly (0.5) or any
      other multiple. This takes the solution from the QP and scales it down
      accordingly to reduce actual leverage.
    '''
    df_fracs = self.calcKCUnconstrainedAllocation()
    heur_fracs = df_fracs.apply(
        lambda x: kelly_level * self.max_leverage * np.abs(x) / \
          np.abs(x).sum() * np.sign(x), 
        axis=1)
    heur_fracs = heur_fracs.rename(
        columns={'unconstrained_fracs': 'heuristic_fracs'})
    return heur_fracs

  def calcKCQuadProdAllocation(self, kelly_level: float=1):
    '''
    Calculates optimal allocation fractions by solving a quadratic program
    according to the Kelly Criterion.

    kelly_level: allows setting to full kelly (1) half-kelly (0.5) or any
      other multiple. This takes the solution from the QP and scales it down
      accordingly to reduce actual leverage.
    '''
    fracs = np.zeros((len(self.data), len(self.tickers)))
    fracs[:] = np.nan
    g = fracs[:, 0].copy()

    for i, (ts, row) in enumerate(self.data.iterrows()):
      if i < self.lookback:
        continue

      means = row.loc[(slice(None)), "mean"].values
      cov = self.cov[i]
      model = buildKCOptModel(means, cov, self.rfr, self.max_leverage)
      results = SolverFactory('ipopt').solve(model)
      fracs[i] = np.array([model.f[j].value * kelly_level
                           for j in model.f])
      g[i] = model.objective.expr()

    df_fracs = pd.DataFrame(fracs, index=self.data.index)
    midx = pd.MultiIndex.from_arrays(
        [self.tickers, len(self.tickers)*['qp_fracs']])
    df_fracs.columns = midx
    return df_fracs

  def calcEqualAllocation(self):
    '''
    Rebalance so that the portfolio maintains a constant, equal allocation
    among each of the assets.
    '''
    fracs = np.ones((len(self.data), len(self.tickers))) / len(self.tickers)
    fracs[:self.lookback] = np.nan
    df_fracs = pd.DataFrame(fracs, index=self.data.index)
    midx = pd.MultiIndex.from_arrays(
        [self.tickers, len(self.tickers)*['eq_fracs']])
    df_fracs.columns = midx
    return df_fracs

Time to run this.

Choose some tickers and plug them in. I'm going to run just a few different sectors using ETFs: broad equities, energy equities, gold, and bonds, but feel free to try your own combos.

# Initialize
opt = OptimalAllocation(['SPY', 'XLE', 'GLD', 'IEF'])

# Calculate optimal allocations
uc_fracs = opt.calcKCUnconstrainedAllocation()
heur_fracs = opt.calcKCHeuristicAllocation()
qp_fracs = opt.calcKCQuadProdAllocation()
eq_fracs = opt.calcEqualAllocation()

The QP is going to take the longest, but the rest will calculate quickly.

Now, we can plot the results (excluding equal allocation because no need to waste space on some straight lines).

colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
labels = opt.tickers

fig, ax = plt.subplots(3, figsize=(12, 8), sharex=True)

ax[0].plot(uc_fracs * 100)
ax[0].set_title('Unconstrained Allocation')
ax[0].semilogy()

ax[1].plot(heur_fracs * 100)
ax[1].set_ylabel('Portfolio Allocation (%)')
ax[1].set_title('Heuristic Allocation')

ax[2].plot(qp_fracs * 100)
ax[2].set_xlabel('Date')
ax[2].set_title('Optimal Allocation')
ax[2].legend(labels=labels, ncol=len(labels),
             bbox_to_anchor=(0.68, -0.3))

plt.tight_layout()
plt.show()

The leverage in the unconstrained case is just stupid. We're only showing it here for illustrative purposes because crazy leverage like this is both impossible (who's going to give you > 100x leverage on an equity position? You're not Fed) and insane. Don't do it.

The other two are more realistic. Many retail brokers will offer up to our 3x limit on liquid ETFs like these.

Let's see how they performed.

For this, we'll look at each 1-year (252 trading day) period in our historical sample and compute the annual returns from our leveraged portfolios. The particular starting date can have a large impact on the overall performance, and we can get a reasonable feel for the expected performance like this.

log_rets = np.log(opt.data.loc[:, (slice(None), "returns")].values)

heur_rets = log_rets * heur_fracs.shift(1)
heur_log_rets_ann = heur_rets.sum(axis=1).rolling(252).sum().dropna()
heur_rets_ann = (np.exp(heur_log_rets_ann) - 1) * 100

eq_rets = log_rets * eq_fracs.shift(1) * opt.max_leverage
eq_log_rets_ann = eq_rets.sum(axis=1).rolling(252).sum().dropna()
eq_rets_ann = (np.exp(eq_log_rets_ann) - 1) * 100

qp_rets = log_rets * qp_fracs.shift(1)
qp_log_rets_ann = qp_rets.sum(axis=1).rolling(252).sum().dropna()
qp_rets_ann = (np.exp(qp_log_rets_ann) - 1) * 100

plt.figure(figsize=(12, 8))

plt.hist(qp_rets_ann, label="Optimal Kelly", alpha=0.3, bins=50)
plt.axvline(x=np.median(qp_rets_ann), label="Optimal Kelly Median",
            c=colors[0])

plt.hist(heur_rets_ann, label="Heuristic Kelly", alpha=0.3, bins=50)
plt.axvline(x=np.median(heur_rets_ann), label="Heuristic Kelly Median",
            c=colors[1])

plt.hist(eq_rets_ann, label="Equal Allocation", alpha=0.3, bins=50)
plt.axvline(x=np.median(eq_rets_ann), label="Equal Allocation Median",
            c=colors[2])

plt.xlabel('Returns (%)')
plt.ylabel('Frequency')
plt.title('Annual Returns for Each Allocation Method')

plt.legend()
plt.show()

We can see here that the median case favors the optimal, QP method for calculating the Kelly Criterion. Perhaps surprisingly, the equal allocation model comes in a close second followed by our heuristic. Our heuristic does have the distinction of having the best worst-case outcome (highest 5th percentile).

Model	5th	50th	95th
Optimal Kelly	-34.1%	25.2%	85.4%
Heuristic Kelly	-5.2%	16.1%	36.2%
Equal Allocation	-35.8%	21.8%	84.4%

These methods aren't predictive, like most any indicator, they're backward-looking, and are assuming that the current correlations and volatilities they see today will hold going forward. In some cases that may be a solid assumption, in others it can cost you dearly.

Of course, any historical backtest is going to have limitations, so take these with a grain of salt. For one, these returns don't take into account margin costs which can be 8-10% or more for many retail accounts. So lop that off of your totals.

These are also re-balancing daily and doesn't take into account the commissions you'd incur if running this. While there are more and more commission free brokerage accounts out there, many of these (looking at you Robinhood) are selling order flow meaning you're getting a worse execution price on your trades than you otherwise would. So shave off a fraction of a percent every day.

On the other hand, we aren't taking any dividend payments into account, so that could boost your annual returns a bit.

Optimizing Your Trading Portfolio

The Kelly Criterion is just one way to optimize your portfolio. There are a variety of ways to do it depending on your assumptions and goals. While far from perfect, the Kelly Criterion has been tried and tested in the real-world and shown fantastic results for skilled traders. And we make this method (and many others) available to you in our no-code algorithmic trading platform.

You don't have to worry about installing and running non-linear solvers, cleaning your data, or implementing a proper, event-driven backtest solution. We do all of that for you.

With just a few clicks, you can choose your entry and exit signals, add position sizing and re-balancing models to optimize your allocations, and run multi-decade backtests to get the stats and decide if this is a strategy for you. When you're satisfied, just deploy your bot to receive live trading alerts and you're set!

Try our free demo today!