Lifetimes Part 2: Gamma Spend Model and Financial Valuation

See here for the full iPython Notebook code.  Some of the descriptions are outdated but the code is almost the same. After getting to know what lifetimes can provide, I started applying it from a financial perspective.  I wanted to answer the most important question for Zakka Canada:  Leveraging our customer analytic models, how can I estimate the firm value that Zakka Canada is worth as of today? The rest of this post is divided into two parts: 1) modelling the monetary value of our customer base and 2) estimating the price of Zakka Canada through a simple present value cash flow valuation […]

Continue reading →

Guns, Bombs and eSports: Applying Data and Portfolio Analytics to Counter-Strike Gambling

Since the publication of Bill James' seminal work, Baseball Abstract, and the rise to stardom for the Oakland A's, Sports Analytics - the application of statistics to competitive sports - has been (and still is) a prominent topic within the industry.  Thus, it is only reasonable for practitioners to apply this movement to the new and upcoming playing field called eSports, which has gained a large following over the years with many online games such as League of Legends, Dota 2 and Counter-Strike: Global Offensive (CSGO).  I would like to argue that the data drawn from eSports is definitely more abundant and easier to acquire whereas, real life sporting data requires physical measurements, whether it's measured by a person […]

Continue reading →

Volatility Models and Backtests on Quantopian

In this blog post, I will present some backtest results on volatility models.  The list I present here are not exhaustive and there are still a gargantuan set of papers focusing on this issue (a good place to start is on vlab).  In the next section, I present some simple notations to define financial volatility and then define each model and show general backtest results with risk attributes.  The premise of the backtest is as follows: financial volatility of an investment portfolio is able to be minimized globally through allocating the correct amount of dollar toward each asset within the portfolio. […]

Continue reading →

Consumer sentiment analysis used for Finance??

I recently just finished a paper with my partner in my data mining course. We were assigned the topic of Naive Bayes classifier and we decided to apply them to sets of Amazon user reviews in a bunch of categories.  I've uploaded the paper on this blog in case you are interested in reading the details of the data mining. In summary, the main results and implications of our research were: It is possible to accurately classify consumer sentiments through analyzing and identifying key words in the review. We can get more accurate predictions and meaningful data by examining reviews […]

Continue reading →

K-Means Portfolio for Value Investors

The Matlab code to easily create your own K-Means Portfolio is up!!  Click here to see it on my Github K-Means Clustering is the simplest clustering algorithm for discovering patterns and structure in data among many dimensions.  Can it work for Value investors? Let's do a simple test to see if it holds up in out-of-sample testing. Basic Overview Without getting into much of the math (a simple google search will suffice), K-Means clusters data through a simple iterative algorithm that initiates centroids and moves each centroid toward an optimized mean where the cost function : euclidean distance between each point and the […]

Continue reading →

Autoregressive Exponential Moving Average Forecasting

I've recently been looking into an automated strategy to implement to my forex trading. School has been real busy and I haven't gotten much time to do technical analysis so I think it'd be better to explore robot trading. In this post I'll be discussing my strategy, it's properties and an example with a simple out-of-sample forecasting benchmarking against an ARMA(2,2) and Random-Walk naive prediction. In the next article, I'll write about implementing it in MetaTrader (MQL4).   An exponential moving average is an extension of a simple moving average where more of the weight is being placed upon the […]

Continue reading →

Cointegration and Statistical arbitrage

Recently, I was introduced to the concept of Cointegration analysis in time-series.  I first read this in a HFT blog at Alphaticks and then the concept came up again when I was looking into Spurious Regressions and why they occur.  Lot's of Quants have blogged about this idea and how it can be applied to the premise of Statistical Arbitrage.  I will do the same and apply this to the not-so-recent Google stock split, however, I will also try to add some math into the mix, briefly touch on Error-correction mechanism and spurious regression.  Finally, I will also give a few criticisms against […]

Continue reading →

U.S Unemployment Time-Series Modelling (Part 1)

One of the many benefits of improving economic forecasts is being able to trade releases with better information through forex and stocks.  Certain sites such as Forexfactory provide a forecast parameter and I was able to play around and figure out some just use standard ARIMA models.  In Part 1, I will show how to estimate unemployment rate log changes and Part 2, I will implement this through a modified BP neural network (if i can get it to work...).  I will be benchmarking my residuals with a standard ARIMA model along with an exogenous regressor (initial claims).  The data was obtained […]

Continue reading →

Bootstrapping Portfolio Risk

Bootstrapping, originally proposed by Bradley Efron, is a statistic technique to approximate the sampling distribution of a parameter .  The term bootstrap was coined from the phrase "to pick oneself up from his own bootstraps".  Something seemingly impossible for a person, just like the bootstrap technique of obtaining more information from the sample.  The prominent use of the Bootstrap  rose when computing power and speed became faster as well as cheaper.  The bootstrap (certain usages) often outperform other mathematical measures because it makes less assumptions such the pop. distribution, relevant parameters, etc.  Furthermore, the bootstrap can approximate most measures whereas analytically deriving […]

Continue reading →

A look into the '08 Crisis with Google Correlation

Discovering Google Correlate was a small silver nugget for me, the reason I say silver is because there are several drawbacks to it. I was going over some research papers seeing how I can improve my simple model for unemployment claims, non-farm payroll, etc. One paper that tapped my interest is written by Hal R. Varian, head honcho economist at Google, that proposed improving the fit of a forecast using Google Trends data.  He'd show his theory through forecasting motor parts sales, unemployment, consumer sentiment, etc.  The models had an overall better fit and out-of-sample test when incorporating Google Trend searches. […]

Continue reading →