K-Means Portfolio for Value Investors

The Matlab code to easily create your own K-Means Portfolio is up!! 

Click here to see it on my Github

K-Means Clustering is the simplest clustering algorithm for discovering patterns and structure in data among many dimensions.  Can it work for Value investors? Let's do a simple test to see if it holds up in out-of-sample testing.

Basic Overview

Without getting into much of the math (a simple google search will suffice), K-Means clusters data through a simple iterative algorithm that initiates centroids and moves each centroid toward an optimized mean where the cost function : euclidean distance between each point and the cluster it belongs to, converges to a stable minimum .  Each centroid move step involves setting the position of the centroid toward the arithmetic mean of the data that belong to the cluster.  This is an interesting result since it is such a simple formula that converges toward a local/global minimum.  This is because the mean itself is a least-squares estimator

Let be some constant (lets say our new centroid location for a certain variable), we want to minimize the distance between our data and the centroid variable:

Here's an example of a 3D (3 variable) k-means clustering

Clusters on a 3D Dimension

The "Value" Philosophy: Tenet of Long-Term Investing

The idea behind value investing is to find currently under priced stocks (Price < Intrinsic Value)  by looking at certain metrics that gauge the financial health and risk level of the company.  Under the efficient market hypothesis, this strategy doesn't work since all prices are discounted to an extent that all publicly available value information are factored in the discount rate.  However, research by academics have noticed that empirically, there are certain time-persistent "anomalies" within the market that can produce imperfectly predictive and excess premia (See Jacobs and Levy (1988)).  A lot of these anomalies are what value investors ultimately look for and the tools to dig them are available financial valuation metrics.

In this study, I have identified five popular metrics to screen for on the S&P 500, TSX Composite and NASDAQ Composite Index:

  • Book to Price
  • Return on Invested Capital
  • Current Ratio
  • EPS 1 Year Growth

I then scaled them so that they have a variance of 1 and mean of 0, this is to prevent any certain bias in the data when picking out clustered portfolios.  Furthermore, notice that all the metrics above are created such that high values are preferential within a portfolio to low values.  This is an important attribute as well in the portfolio selection stage.

Portfolio Selection

To determine the optimal amount of clusters, a simple heuristic often proposed is .  Taking this result, I run a simple elbow method by plotting a range of (from the formula) and their respective Cost .

We want to pick a spot that decrease the most
We want to pick a spot that decrease the most

Looking at the plot, it seems that the errors decrease at a quadratic pace.  Most of the time, we would want to pick a spot where the marginal error change is the greatest.  However, I have a rule where I want my viable portfolios to be generally around 20-50 stocks. Therefore, I chose .  Most of the time, K needs to be played around until the cluster size is subjectively optimal to the investor.  One code trick I found to be helpful is to use the hist(x,n-bins) function to visualize the size of each cluster where x = idx output from kmeans() function and n-bins = K.


Screening out portfolios with stocks, I take an equally-weighted average of all the variables of each portfolio and selection the portfolio with the maximum average.  Taking this portfolio, I equally weight these stocks and see how they perform out-of-sample.  I sampled these metrics from Bloomberg as of Jan 2, 2010 and tested them from Feb 1, 2010 to Feb 1, 2015.

Portfolio Benchmarking

Total Return and Sharpe Ratio (Assuming )

  • Equal-Weight Portfolio: 140.1% - 28.09
  • S&P 500 Index: 78.86% - 21.12
  • NASDAQ Index: 104.3% - 23.92
  • Market-Weighted Portfolio: 323.06% - 60.13

Looking at these results, it's surprising how well Market-Weighted portfolio performed.  Is this pure luck? Or a result of picking a good portfolio?  It's mostly luck, take a look at the weighting for each stock in the Market-Weighted Portfolio


Biogen took up 40% of this portfolio due to their Market Cap.  It's also reasonable since most of these value screens are more likely to find small cap stocks than large cap.  Biogen (BIIB) also made from 2010 to 2015 a whopping 627.88% return!!! If we invested all of our money then we would be laughing to the bank.

The Fama-French Decomposition

Using data from Kenneth French's site, I will decompose each portfolio's returns into 5 factors and analyze the attribution of risk-adjusted returns and alpha of the market weighted and equal portfolios.  Fama and French (2014) introduce two additional variables to the popular three factor model (Value, Market, Size) that attempts to capture the prospective premia linked with investments and profitability.  More specifically, the two differencing portfolios: RMW and CMA, gauges the level of profitability (Robust minus Weak Operating Profit) and investments to a firm (Conservative to Aggressive Investments) that can potentially explain more anomalous return.

Regressing the two portfolios, we find that alpha for the market-weighted portfolio is positive while the equally weighted alpha is negative:

Market-Weighted Portfolio :

  • : 0.65%
  • : 1.17 - More volatile than market
  • : -0.017 - More returns generated from bigger-sized firms
  • : -0.8292 - More returns generated by low B/P value firms
  • : 0.3480 - Firms tend to have robust profitability structures in place
  • : -0.5043 - Firms tend to be aggressive in their capital investments
  • =58.6%

Equal-Weighted Portfolio:

  • : -0.12%
  • : 1.09 - More volatile than the market but less than Mkt-Weight
  • : 0.5716 - More returns generated from small-cap firms (Since no Biogen)
  • : -0.4151 - More returns generated by low B/P value firms
  • : 0.004 - Firms tend to not correlate with this portfolio and thus have a mix of profitable and unprofitable firms
  • : -0.5437 - Firms tend to be aggressive in investments, more aggressive than Mkt-Weighted Portfolio
  • =86.3%

Given these factors, it seems that the equally weighted portfolio lean toward a small-cap based stock that have mixed profitability, heavy aggressive investment and low in Book to Price.  Unfortunately, the negative alpha shows that given the risk factors, the equally-weighted portfolio actually under performed.  Furthermore, the market-weighted portfolio (with Biogen as a big player) tend to have more volatility than EWP, more large-cap focused, high robust profitability and a bit less aggressive in investments.  The alpha of the firm is on average 0.65% showing that the portfolio had a lot of additional premium that the factors couldn't capture.
Thanks for reading!

Leave a Reply

Your email address will not be published. Required fields are marked *