TFI Restaurant Revenue Kaggle Competition

Read my report HERE and R code is provided HERE (warning: messy but all-inclusive code) I recently entered into my first Kaggle competition with a friend while in my graduate (I'm undergrad) Data Mining class.  At first I thought this course was going to be super difficult and while the content is certainly overwhelming, I felt that most people were pretty much at the same level of understanding that I had.  The class was super interesting and although I could not get to know all the details for each chapter, they certainly opened my eyes to how amazing and vast […]

Continue reading →

Consumer sentiment analysis used for Finance??

I recently just finished a paper with my partner in my data mining course. We were assigned the topic of Naive Bayes classifier and we decided to apply them to sets of Amazon user reviews in a bunch of categories.  I've uploaded the paper on this blog in case you are interested in reading the details of the data mining. In summary, the main results and implications of our research were: It is possible to accurately classify consumer sentiments through analyzing and identifying key words in the review. We can get more accurate predictions and meaningful data by examining reviews […]

Continue reading →

K-Means Portfolio for Value Investors

The Matlab code to easily create your own K-Means Portfolio is up!!  Click here to see it on my Github K-Means Clustering is the simplest clustering algorithm for discovering patterns and structure in data among many dimensions.  Can it work for Value investors? Let's do a simple test to see if it holds up in out-of-sample testing. Basic Overview Without getting into much of the math (a simple google search will suffice), K-Means clusters data through a simple iterative algorithm that initiates centroids and moves each centroid toward an optimized mean where the cost function : euclidean distance between each point and the […]

Continue reading →

Mean-Variance Net Neutral Portfolios

I haven't posted much since the start of school.  I'm still working on Portfolio Management but much of what I have learned aren't that worth blogging about since it's nothing new and different.  The only piece of thing I have on my blog is in relations with my work on The Fund.  I also haven't updated the Docs as well on MPT because I haven't read much of that book recently.  I'm currently focusing on getting a better overview of the available Black-Litterman literature.  There are also other methods I need to get to learning such as Portfolio Sorts by Almgren, Entropy Pooling by Meucci, […]

Continue reading →

A Forex News Trader called "Newspaper"

School is continuing as usual and I have been doing a lot of reading in portfolio optimization/management as part of my position in The Fund.   Recently, I've been coding on the side for an automated forex trader that trades off of economic indicator releases.  The other algorithmic pursuits I have been working on such as AREMA, etc hasn't been working well.  I spent a lot of time on debugging it but when it comes to backtesting, it is very difficult to churn a positive profit.  I haven't bothered with the machine learning aspect of it either but it might be […]

Continue reading →

Autoregressive Exponential Moving Average Forecasting

I've recently been looking into an automated strategy to implement to my forex trading. School has been real busy and I haven't gotten much time to do technical analysis so I think it'd be better to explore robot trading. In this post I'll be discussing my strategy, it's properties and an example with a simple out-of-sample forecasting benchmarking against an ARMA(2,2) and Random-Walk naive prediction. In the next article, I'll write about implementing it in MetaTrader (MQL4).   An exponential moving average is an extension of a simple moving average where more of the weight is being placed upon the […]

Continue reading →

Wrong question, Right answer

A bit off topic but I thought this is worth a post.  It's a story of how my room-mate and I solved a problem that was asked wrong.  It all started when my friend, John, asked on skype to help him with a semi-challenging Math problem: Let Area of a right triangle be 25cm squared. Express the perimeter as a function of the hypotenuse h(p) - John The actual question is asking to express the hypotenuse as a function of perimeter since his final answer was different than ours. It Steve and I about 2-3 hours to figure out with […]

Continue reading →

Cointegration and Statistical arbitrage

Recently, I was introduced to the concept of Cointegration analysis in time-series.  I first read this in a HFT blog at Alphaticks and then the concept came up again when I was looking into Spurious Regressions and why they occur.  Lot's of Quants have blogged about this idea and how it can be applied to the premise of Statistical Arbitrage.  I will do the same and apply this to the not-so-recent Google stock split, however, I will also try to add some math into the mix, briefly touch on Error-correction mechanism and spurious regression.  Finally, I will also give a few criticisms against […]

Continue reading →

U.S Unemployment Time-Series Modelling (Part 1)

One of the many benefits of improving economic forecasts is being able to trade releases with better information through forex and stocks.  Certain sites such as Forexfactory provide a forecast parameter and I was able to play around and figure out some just use standard ARIMA models.  In Part 1, I will show how to estimate unemployment rate log changes and Part 2, I will implement this through a modified BP neural network (if i can get it to work...).  I will be benchmarking my residuals with a standard ARIMA model along with an exogenous regressor (initial claims).  The data was obtained […]

Continue reading →

Bootstrapping Portfolio Risk

Bootstrapping, originally proposed by Bradley Efron, is a statistic technique to approximate the sampling distribution of a parameter .  The term bootstrap was coined from the phrase "to pick oneself up from his own bootstraps".  Something seemingly impossible for a person, just like the bootstrap technique of obtaining more information from the sample.  The prominent use of the Bootstrap  rose when computing power and speed became faster as well as cheaper.  The bootstrap (certain usages) often outperform other mathematical measures because it makes less assumptions such the pop. distribution, relevant parameters, etc.  Furthermore, the bootstrap can approximate most measures whereas analytically deriving […]

Continue reading →