Discovering Google Correlate was a small silver nugget for me, the reason I say silver is because there are several drawbacks to it. I was going over some research papers seeing how I can improve my simple model for unemployment claims, non-farm payroll, etc. One paper that tapped my interest is written by Hal R. Varian, head honcho economist at Google, that proposed improving the fit of a forecast using Google Trends data. He'd show his theory through forecasting motor parts sales, unemployment, consumer sentiment, etc. The models had an overall better fit and out-of-sample test when incorporating Google Trend searches. Google Trends can be extremely valuable in improving a Model as long as you had the right search terms.
Finding the right search terms can be difficult especially for aggregate macro variables such as CPI and retail core sales. Furthermore, there may be significantly intercorrelated search terms vital to the model's that we may not even know!
Using Google Correlate was my first destination to finding some relevant terms. Googling "Unemployment" I was able to find relevant terms that worked closely such as "ui benefits" and "maximum unemployment benefits"
Next, I used the feature that allowed me to enter my own timeseries data and Google would then find terms that displayed similar patterns to the timeseries. I pulled the data off of Quandl for Unemployment Claims (seasonally adjusted) and fed it into Google. One of the problems I noticed immediately was that many of the highly correlated results (high 90s) were random, ranging from Nokia phones to porn websites. I thought about this problem for a bit and believe I have an answer. The correlation coefficient is a weighted average such that most of the weight fell on significant movements and shocks. The 2008 claims spiked and then descended at a slower pace. Viral content that blow up on the internet around this time follow the same type of movement! Heres an example of what I mean when I google "Nyan Cat":
Working with this problem, I fed in the non-seasonally adjusted data to create more emphasis on recent cyclical data and that partially mediated the weighted average issue. There was a staggering difference in the correlated terms between the seasonal and non-seasonal data as a consistent pattern of similar search terms started to appear: "loan mortgage modifications".
Similar terms all pointed towards mortgage modifications. You can see from the above graph that the correlation was strongest starting from 2008 to beginning of 2010. Correlation died down soon after as the term's pattern did not replicate the seasonal factors that the claims exhibited (I also narrowed the search down to after 2011 and loan mods were no where near the top 100 correlations but rather specific unemployment searches such as NY Unemployment). What makes this discovery interesting is that during the crisis, Americans looking for ways to pay off their loans resorted to mortgage modifications. Google trends was able to sort through parts of the Big Data to help us accurately pinpoint a small source of the big problem. Knowing this piece of information, it would be certainly interesting to apply this technique to future events to assist in analyzing the problem and consequent actions.