TFI Restaurant Revenue Kaggle Competition

Read my report HERE and R code is provided HERE (warning: messy but all-inclusive code)

I recently entered into my first Kaggle competition with a friend while in my graduate (I'm undergrad) Data Mining class.  At first I thought this course was going to be super difficult and while the content is certainly overwhelming, I felt that most people were pretty much at the same level of understanding that I had.  The class was super interesting and although I could not get to know all the details for each chapter, they certainly opened my eyes to how amazing and vast data science can be.

For our final project, we formed groups and decided on whether to work on a boring UCI repo. project or a fun Kaggle competition with potentially real data (ironically, the topic we chose were generated).  Rene and I decided on the TFI Restaurant revenue prediction challenge which is to see who can best predict a cross-sectional sample of annual Turkish restaurants' revenues.   You can read the full project report here with the accompanying r code hosted on my Github here (Warning: the r code is really messy but entails EVERYTHING).

In summary, we started off with model first then feature second.  Looking back, this was a huge mistake and it should of been the other way around.  Models employed (in order) were: Linear regression, Random Forest, SVM, Ensemble (various crap like GBM, lasso, etc).  Ensembling was done last when we were out of options.  When I got to implementing Random Forest, Rene was at approximately 455th place with a simple RF.  The data was pretty weird in terms of train/test set features; our report details all the issues as "problems" in section 3.  After a few clever hacks with kNN and K-Means, I fed in both SVM and RF and got 58th place!  However, Top 100 was short lived as other competitors caught up within periods of a few days...  I had to focus on other exams so I handed everything off to Rene.  After a few more days of tampering, we couldn't seem to get better results and concluded it at that.

3 thoughts on “TFI Restaurant Revenue Kaggle Competition

  1. Hello Kevin,

    I am sorry to introduce myself here and in this way.

    My name is Barry, and I am a first-year Ph.D. student at the City University of Hong Kong. I used the dataset you uploaded in Kaggle on Prosper and Lending Club, for some preliminary research on peer-to-peer lending. As the results are great, my advisor and I are considering continuing with the research project. Therefore, it would be great if you can tell me the data source so that I can do a double-check with the data.

    I cannot find your contact information online. I will highly appreciate it if you can drop me an email at [zonghao.y@my.cityu.edu.hk]

    Thank you! Looking forward to your reply!

    I look forward to discussing this with you. Thanks!

    1. Great that you are taking an interest in the data set. I directly downloaded it from prosper and lending club website. Since this was quite a few years ago, I cannot tell you the exact methodology that I used to create the final dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *