Position Sizing is often overlooked by the retail trader. Most wind up over betting and taking on too much risk and blowing up their account. On the other hand, investing in very small amounts in too many places can also ruin your gains (but hey, at least you will stay in the game!) In both cases, the size of your positions significantly shapes the risk profile of your portfolio.

So how do you determine the position size that optimizes your risk and returns?

Enter the Kelly Criterion.

Developed by John Kelly to denoise telephone lines while working for Bell Labs in the 1950's, the Kelly Criterion is a formula that has been applied to both gambling and investing. It's been used by traders on Wall Street as well as mob bosses running horse races and casinos. (For a great read on the fascinating history of this equation, check out Poundstone's Fortune's Formula.)

So what is this magic formula? If you do a quick Google search for how to apply the Kelly Criterion to stocks, you'll probably come across a formula like this one:

$$f^* = p - \frac{1-p}{b -1}$$

This technically is a formula for the Kelly Criterion, but it's only relevant if you're making a binary bet, i.e. a bet where you either win or lose. This formula doesn't work for continuous outcomes like stocks or crypto.

For those, you need to update your equation to take into account a continuous range of possible outcomes (like all the possible prices a stock can take). Skipping the derivation (section 7 here if you're interested), we wind up with:

$$f^* = \frac{\mu - r}{\sigma^2}$$

where 𝑓∗ is the Kelly Fraction - the optimal amount you invest in your risky asset while the rest sits in a hypothetical risk-free asset (e.g. cash or bonds).

We've actually covered the binary Kelly Criterion and the single-asset Kelly Criterion before - so check those articles out for some more background.

While this formula is great, it still only applies to a single risky asset. What if there are multiple assets that you would like to invest in? How can we extend the Kelly Criterion to optimize an entire portfolio?

Try Googling this and you'll find some dense academic papers dealing with quadratic programming - a special case of optimization that you find in graduate-level operations research courses.

So what's the average retail investor to do? Sometimes it really can feel like there's secret investment knowledge that's just not available to the average person. Is it time then to give up and finally enroll in your local university's math PhD program?

No! (well unless you really want to). We're going to break down the Kelly Criterion for multiple assets and show you how to apply it using free, open-source software and Python.

Linear Programming, Optimization, and Your Portfolio

Most people don't have a background in math programming and optimization, so we'll spend a bit of time tackling some of these issues so that the rest becomes clear. If you're one of the few who knows how to solve a MINLP (or even what that stands for), feel free to jump ahead. For the rest, let's take a quick crash course on the subject!

Optimization via math programming consists of solving a set of algebraic equations in order to minimize or maximize an objective function. For portfolio optimization, we may write something like this:

$$\max_{f} z \quad (1)$$ $$\textrm{s.t.} \sum_{i=1}^n f_i = 1 \quad (2)$$ $$f_i \in [0, 1] \quad (3)$$

This is the general model, so let's go through it line-by-line:

  1. This is the objective function and is what you're trying to optimize. In short, it says that we're maximizing (i.e. optimizing) z by changing our variable f (also called our decision variable). Say you're designing a portfolio and you want to maximize your returns or risk-adjusted returns, or minimize volatility, or minimize/maximize some other quantity, then this is where you'd define that. You'll notice that we haven't defined our function z yet. That's because right now we're dealing with the general form optimization problems take and not get to deep in the weeds. But we'll look at specifics later on.
  2. This is where we write out the constraints for the problem. This section is denoted by s.t. meaning "subject to" or "such that", after which we list out all of the constraints. Equation 2 is our first constraint which is saying that the sum of all of our 𝑓_𝑖 values have to equal 1.For our purposes 𝑓_𝑖 will be the fraction of our capital we put into each asset i. So this constraint means that if we added the fractions of our capital across all of the assets we've invested in, the fractions would sum to 1. In other words, we aren't letting the model hold any money back--it all has to be invested.
  3. Equation 3 defines our decision variables. Since each 𝑓_𝑖 is a fraction, it must take a value between 0 and 1.

This set of equations defines our entire general model. There are optimization models that are much larger and more complex than this, but they're built on the same basic three-part structure: objective function, constraints, and variables.

The next step would be to plug values into the equation and solve for the results, but first, we need to specify our objective function!

Optimizing a Portfolio with the Kelly Criterion

According to the Kelly Criterion, we want to find the allocation that maximizes the growth rate of our investment (𝑔). The different assets we have to choose from have different statistics, namely returns, volatilities, and correlations with each other. To optimize our growth rate we need to take all of these into account.

I'm going to skip the details of the derivation - you can find those here (one of those dense, academic papers I referred to earlier) - and jump straight to our final and complete model:

$$\max_{f} g = r + \sum_{i=1}^n f_i (\mu_i - r) - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n f_i f_j \hat{\Sigma}_{ij} \quad (1)$$ $$\textrm{s.t.} \sum_{i=1}^n f_i \leq 1 \quad (2)$$ $$f_i \in [0, 1] \quad (3)$$

The primary change here from our earlier model is the objective function, so let's take a deeper look at it.

$$\max_{f} g = r + \sum_{i=1}^n f_i (\mu_i - r) - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n f_i f_j \hat{\Sigma}_{ij}$$

This equation says we're trying to maximize 𝑔 - our growth rate - by deciding how to set 𝑓_𝑖 - our portfolio fractions (don't be confused that some are denoted by i and others by j, we're summing the same values, just indexing them differently so we can sum over all combinations of fractions). 𝑟 is our risk-free rate, and 𝜇 denotes our mean returns.

Σ̂ 𝑖𝑗 is our estimated variance-covariance matrix. We get this by calculating the covariance of all of the assets with respect to one another, then setting the diagonal of the matrix to the variance of the individual asset.

The right-hand side of the equation can be understood as follows: we take our risk-free rate, add together all of the fractions of our portfolio multiplied by their excess return, and subtract the covariance of each fraction, which is a measure of our risk with assets that may move together.

So that's our objective function. The other two equations are pretty much the same as in our general model. The only difference is that we've updated our f_i summation constraint (equation 2) to be less than or equal to 1, as opposed to strictly equal to 1. This means that if the fractions sum to less than 1, then any capital not invested in a risky asset will be invested at the risk-free rate.

Our portfolio fractions still can only be between 0 and 1 (equation 3), meaning we aren't allowing the model to take short positions or apply any leverage. Of course, we could relax this constraint to provide some additional flexibility to our model, but for now, we'll stick with this standard formulation.

There's something else to note about our objective function: it's non-linear.

To be more precise, it's quadratic which makes this a quadratic programming problem (we have some f-squared terms that will pop up). These are harder to solve than linear problems for a number of technical reasons (you can read all the details in Chapter 4 here). The good news is that we can solve this! It just might take some time for larger problems.

Now that we have our model, we'll show you how to use Python to optimize a portfolio according to the Kelly Criterion!

Kelly Criterion Optimization in Python

First let's get some sample data to work with.

$$\mu_i = [0.0476, 0.004]$$ $$\hat{\Sigma}_{ij} = \begin{bmatrix} 2.12 & 1.03 \\ 1.03 & 1.89 \end{bmatrix}$$

We're going to optimize this in two ways - one using the math programming method as shown above and the second using a brute force method to verify the results.

For the math programming model, we're going to use a package called Pyomo which you can install with pip.

pip install pyomo

The nice thing about math programming is that once you have your mathematical model written out (see equations 1, 2, and 3 above) then it's fairly straightforward to build the model and let Pyomo and your solver do the rest.

I'm assuming you have some standard packages like NumPy already, but if not install those and import them.

import numpy as np
from pyomo.environ import *

Next, let's assign our data to variables to fill in our model.

returns = np.array([0.0476, 0.004]) # mu
varcov = np.matrix([[2.12, 1.03], # sigma_ij
                    [1.03, 1.89]])
rfr = 0 # risk-free-rate

n = len(returns) # number of assets

Pyomo uses a ConcreteModel object to link the constraints, objective function, and variables together. So the first step is to instantiate it so we can add these features as attributes.

model = ConcreteModel()

From here, we define our indices, parameters, and variables using Pyomo's classes. Note that these are all assigned as attributes to your model.

The RangeSet() class is a bit odd in that it is indexed by 1 instead of 0, like everything else in Python. We can easily fix this by making it explicit. This will give us our index to iterate over, just like you see in the mathematical model above.

# Sets and indices
model.i = RangeSet(0, n-1)

Now we're going to define our parameters - the data that we feed into the model - and our decision variables.

The parameters are initialized with dictionaries and the values are keyed by the index to keep everything nice and consistent. Assigning these values in this manner isn't completely necessary, but I'm doing it just to do things "properly" - you could leave them as NumPy arrays and as long as the index matches, it will solve just fine. It can just be helpful if you want to have more descriptive indices (e.g. make them stock tickers instead of numbers) or have a more complex set up.

# Decision variables
model.f = Var(m.i, domain=UnitInterval)

# Parameters
model.mu = Param(m.i, 
               initialize={i: m for i, m in zip(m.i, returns)})
model.sigma = Param(m.i, m.i, 
                  initialize={(i, j): varcov[i, j] 
                              for i in m.i 
                              for j in m.i})

We can add our constraint using the @ decorator in Python. This just wraps the custom function in Pyomo's Constraint function and assigns it to the model. I do this because I find it easier to read.

@model.Constraint()
def fullyInvestedConstraint(model):
  return sum(model.f[i] for i in model.i) == 1

Note two things about this constraint. First, we use list comprehensions to operate over vectors. Second, I changed this constraint slightly from the original formulation to equal 1 instead of be less-than-or-equal to 1. Above is more general, but for this brief example, we're going to ignore the risk-free rate as a simplifying step (we'll go back to it later, and I hope I didn't just introduce confusion into the equation!).

Lastly, we have our objective function.

# Objective
@model.Objective(sense=maximize)
def objective(model):
  return (rfr + sum(model.f[i] * (model.mu[i] - rfr) for i in model.i) - \
          sum(
              sum(model.f[i] * model.sigma[i, j] * model.f[j] for j in model.i)
          for i in model.i) / 2)

Sorry that this is such a mess! Hopefully you can see that we have all of the elements as shown above, they're just broken down so we have a constant value plus a list comprehension plus a double list comprehension. We pass the maximize class to the Objective function to tell Pyomo which direction to move as it solves.

You can run each of these in turn to build your model, but I like to wrap the model construction in a function like this:

def buildKCOptModel(returns: np.array, varcov: np.matrix, 
                    rfr: float = 0):
  assert returns.shape[0] == varcov.shape[0]
  assert returns.shape[0] == varcov.shape[1]

  m = ConcreteModel()

  # Indices
  m.i = RangeSet(0, returns.shape[0] - 1)

  # Decision variables
  m.f = Var(m.i, domain=UnitInterval)

  # Parameters
  m.mu = Param(m.i, 
               initialize={i: m for i, m in zip(m.i, returns)})
  m.sigma = Param(m.i, m.i, 
                  initialize={(i, j): varcov[i, j] 
                              for i in m.i 
                              for j in m.i})

  # Constraints
  @m.Constraint()
  def fullyInvestedConstraint(m):
    return sum(m.f[i] for i in m.i) == 1

  # Objective
  @m.Objective(sense=maximize)
  def objective(m):
    return (rfr + sum(m.f[i] * (m.mu[i] - rfr) for i in m.i) - \
            sum(
                sum(m.f[i] * m.sigma[i, j] * m.f[j] for j in m.i)
            for i in m.i) / 2)
  
  return m

You can now build your model by running the following line:

model = buildKCOptModel(returns, varcov, rfr)

If you want to see the details of your model to make sure everything looks correct, you can use the "pretty print" method:

model.pprint()
1 Set Declarations
    sigma_index : Size=1, Index=None, Ordered=True
        Key  : Dimen : Domain : Size : Members
        None :     2 :    i*i :    4 : {(0, 0), (0, 1), (1, 0), (1, 1)}

1 RangeSet Declarations
    i : Dimen=1, Size=2, Bounds=(0, 1)
        Key  : Finite : Members
        None :   True :   [0:1]

2 Param Declarations
    mu : Size=2, Index=i, Domain=Any, Default=None, Mutable=False
        Key : Value
          0 : 0.0476
          1 :  0.004
    sigma : Size=4, Index=sigma_index, Domain=Any, Default=None, Mutable=False
        Key    : Value
        (0, 0) :  2.12
        (0, 1) :  1.03
        (1, 0) :  1.03
        (1, 1) :  1.89

1 Var Declarations
    f : Size=2, Index=i
        Key : Lower : Value : Upper : Fixed : Stale : Domain
          0 :     0 :  None :     1 : False :  True : UnitInterval
          1 :     0 :  None :     1 : False :  True : UnitInterval

1 Objective Declarations
    objective : Size=1, Index=None, Active=True
        Key  : Active : Sense    : Expression
        None :   True : maximize : 0.0476*f[0] + 0.004*f[1] - (2.12*f[0]*f[0] + 1.03*f[0]*f[1] + 1.03*f[1]*f[0] + 1.89*f[1]*f[1])/2

1 Constraint Declarations
    fullyInvestedConstraint : Size=1, Index=None, Active=True
        Key  : Lower : Body        : Upper : Active
        None :   1.0 : f[0] + f[1] :   1.0 :   True

7 Declarations: i f mu sigma_index sigma fullyInvestedConstraint objective

There's one more step we need to complete; we need a solver that can handle quadratic programs.

A lot of these solvers require expensive licenses (e.g. Gurobi, BARON, CPLEX) or are only available for academics. Thankfully, we can use the freely available IPOpt to solve our QP!

WARNING: Some solvers do not work on Windows!

IPOpt will run on Windows according to their installation instructions, but not all solvers do and it can be a bit involved to get these to work. You can install and run all of this in Google Colab if you're having trouble doing it locally with whatever machine you're using.

The following will install IPOpt on a Linux machine or Google Colab though.

# Install IPOPT
!wget -N -q "https://ampl.com/dl/open/ipopt/ipopt-linux64.zip"
!unzip -o -q ipopt-linux64

Finally, we can plug all of this into our solver and see what happens:

results = SolverFactory('ipopt', executable='/content/ipopt').solve(model)
print(f"g = {model.objective.expr():.3f}")
print(f"Fractions = {[np.round(model.f[i].value, 3) for i in model.i]}")
g = -0.732
Fractions = [0.463, 0.537]

It solved giving us optimal allocations of 46.3% and 53.7% into assets 1 and 2 respectively. The negative value of the objective function indicates we're losing money if these stats hold (that 0% risk-free rate is looking better now, right?) but I wouldn't worry about it because I just made some numbers up to see if the model works.

We can also verify that our formulation is correct with a brute-force search.

Brute Force Portfolio Optimization

This is quite simple given the small size of the model. If we limit ourselves to two risk assets, we can check the output of the objective function for every combination of fractions over a discretized interval (adding any other asset would increase the computational cost of this search method exponentially, hence why we're keeping it simple to start).

All we need is a function to evaluate our objective function.

# Brute force optimization
def optFraction(f, returns, varcov):
  return np.asarray(f.dot(returns) - f.dot(varcov).dot(f) / 2)

Here, we can use NumPy's vectorization to compute the dot products of our arrays, so the form is much more compact.

Now, let's discretize between 0 and 1 and loop over every combination.

risk_frac1 = np.linspace(0, 1, 1000)
risk_frac2 = 1 - risk_frac1
F = np.hstack([risk_frac1.reshape(-1, 1),
              risk_frac2.reshape(-1, 1)])

g = []
for f in F:
  g.append(optFraction(f, returns, varcov))

g = np.vstack(g)
print(f"Max fractions = {F[max_pt].round(3)}\ng = {g.max():.3f}")
Max fractions = [0.463 0.537]
g = -0.732

We got the same results! This should give us confidence in our implementation.

We can also plot our objective function and see how it looks given the values we have.

kelly-criterion-brute-force-optimization.png

Optimizing a Stock Portfolio with Yahoo Finance Data

We showed how to use this with just some fake data, now let's get some real data to test our functions.

For this, we'll use the yfinance package and pull some data from Yahoo Finance's API.

import yfinance as yf
import pandas as pd

I'm going to grab some random tickers for our data.

start = '2020-01-01'
end = '2021-12-31'
tickers = ['AAPL', 'F', 'GE', 'CVX']
yfObj = yf.Tickers(tickers)
data = yfObj.history(start=start, end=end)
data.drop(['High', 'Low', 'Open', 'Volume', 'Stock Splits', 'Dividends'],
        axis=1, inplace=True)
data.columns = data.columns.swaplevel()

We have two years of data for four stocks. If we use a 1-year lookback period to initialize covariance, means, and so forth, we can get one year of optimal portfolio allocation.

The first step is to calculate our daily returns, then turn these into means, variances, and covariances using the rolling method in Pandas. The covariance data frame is going to need some extra work because we're getting a matrix for every day beyond our lookback period.

There are multiple ways to handle this, but the easiest way I came up with is to convert it into a 3-D array where each row denotes that day's covariance matrix for easy iteration.

Once we have that, we initialize some vectors for our fractions and growth rate, then iterate over all of our data and optimize our portfolio allocation at each day using the Pyomo models we defined above. We stick all of that data together at the end and get a big Pandas data frame with growth, variance, optimal allocations, and all the rest for analysis.

Here's the full function:

def getKCOpt(data: pd.DataFrame, lookback=252, rfr=0):
  returns = data.loc[:, (slice(None), 'Close')] / \
    data.loc[:, (slice(None), 'Close')].shift(1)
  returns = returns.rename(columns={'Close': 'returns'})
  means = returns.rolling(lookback).mean().rename(
    columns={'returns': 'mean'})
  var = returns.rolling(lookback).var().rename(
    columns={'returns': 'var'})
  df = pd.concat([returns, means, var], axis=1)
  # Get covariance matrices and transform to 3D array
  n = returns.shape[1]
  cov = returns.droplevel(1, axis=1).rolling(lookback).cov().values.reshape(
      -1, n, n)
  
  fracs = np.zeros((df.shape[0], n))
  fracs[:] = np.nan
  g = np.zeros(df.shape[0])
  g[:] = np.nan

  for i, (ts, row) in enumerate(df.iterrows()):
    if i < lookback:
      continue
    means = row.loc[(slice(None), 'mean')].values
    var = row.loc[(slice(None), 'var')].values
    varcov = cov[i]
    np.fill_diagonal(varcov, var)
    model = buildKCOptModel(means, varcov, rfr)
    results = SolverFactory('ipopt').solve(model)
    fracs[i] = np.array([model.f[j].value for j in model.f])
    g[i] = model.objective.expr()

  df_fracs = pd.DataFrame(fracs, columns=returns.columns, 
                          index=returns.index).rename(
                              columns={'returns': 'fraction'})
  df_g = pd.DataFrame(g, index=returns.index)
  df_g.columns = pd.MultiIndex.from_arrays(
      [['g'], ['g']])

  return pd.concat([data, df, df_fracs, df_g], axis=1)

Now let's pass our data and plot the growth and fractions!

df = getKCOpt(data)

fig, ax = plt.subplots(2, figsize=(12, 8), sharex=True)

ax[0].plot(df.loc[:, (slice(None), 'fraction')] * 100)
ax[0].set_ylabel('Portfolio Allocation (%)')
ax[0].set_title('Kelly Optimal Portfolio Allocation')
labels = [i for i in list(df.columns.levels[0]) if i in tickers]
ax[0].legend(labels=labels)

ax[1].plot((df.loc[:, 'g'] - 1) * 100)

ax[1].set_xlabel('Date')
ax[1].set_ylabel('Daily Growth Rate (%)')
ax[1].set_title('Kelly Optimal Growth Rate')

plt.show()
kelly-criterion-portfolio-optimization.png

In the plots above, we see that the optimal portfolio evolves over time moving from all-in on Apple, to all-in on Ford, GE, and back to Ford again. Our model was usually allocated to a single stock, or was transitioning between two as the stats changed over time.

The Kelly Criterion will often put a lot of leverage into a position because it's trying to maximize the return. Here, we've optimized it with no leverage, so it moves back and forth with the capital it has to get the biggest bang for its buck.

This optimization model is not predictive though, so don't expect a long-only backtest like this to automatically do well. It's based on the assumption that the future will look like the recent past and is optimizing based on the past year. For this reason, most traders who use the Kelly Criterion trade with half or even quarter Kelly weightings, which means that if the Kelly Criterion tells them to put 100% of their capital into a trade, they'll scale that down to 50% or 25%. There are a few ways to update our model to take this into account.

We could put a hard cap on our allocation of 25% or 50% and constrain our individual position sizes. Theoretically, however, the Kelly Criterion could go much higher than 100% and be calling for 200%, 300%, 500% allocation (i.e. significant leverage) or more for a position. These higher levels aren't always prudent though so we could set a maximum portfolio leverage constraint and iteratively solve our optimization problem by scaling down our maximum and using that as a fixed value in our next iteration until we've allocated all of our portfolio's capital.

Trading with the Kelly Criterion

We showed how to optimize your portfolio with the Kelly Criterion, but to use it in a trading system is a bit different. When we come back to this topic we'll show you exactly how to do that while relaxing some of our constraints to allow you to use leverage and take short positions.

This math and optimization stuff can be a bit much, especially as we start to add complexity within a real trading system. If we weren't such nerds here at Raposa, we'd feel the same way.

The truth is, we actually like this stuff.

But we realize there are lots of bright people out there who don't have the time or patience to set up their own trading systems with all of this complex math, computer systems, data cleansing and the like. They just want to test ideas and get to trading!

That's why we're building an advanced, no-code trading platform. We want traders to have access to all of the best data and algorithms to design your own custom algorithm, test it on historical data, then take it live and let your bot trade your money for you.

Come check out our free demo and let us know what you think!