Big Data. We hear about it a lot these days. But what does it actually mean? And what does it actually do? Well, one way we are utilising the power of ‘big data’ (which effectively just means ‘data’, but lots and lots of it) at Microsoft is through our Bing Predicts engine. It’s an initiative that uses machine-learning models to analyse and detect patterns from a range of (big) data sources such as the web and social activity in order to make accurate predictions about the outcome of events. We’ve correctly predicted the winners of 15 of 16 of the World Cup knock outs in Brazil, and reality TV contests in the US. This time we wanted to see if our models could work their magic on a ‘once in a lifetime’ event like the Scottish Independence Referendum.
Well, we did it. Our model correctly predicted the outcome of the Scottish Referendum: that Scotland would vote “No” and stay in the United Kingdom.
Our prediction (as of 18/09/14): 48.7% YES, 51.3% NO
Final announced voting results (as of 19/09/14): 45% YES, 55% NO
|YouGov source (excluding ‘don’t know’)
September 19, 2014
Interestingly, while some polls tracking the predicted voting changed between “yes” and “no” in the past months and others predicted the “no” vote to be in the 70-80% range, our prediction has always remained “no” in a tight race (between 51-58% “No” – see graph) through our data sources since we began analysing it. The final vote goes to show just how difficult it is to predict such a unique event. To see our final prediction against the final vote results, simply enter “Scottish Referendum” into Bing.com.
As we said in our earlier blog post, because the Scottish Referendum is a ‘once-in-a-lifetime’ event with no historical precedence it brings a number of challenges for data analysis. For the prediction, we started with the trends and sentiment determined from the web and social data, and we then adjusted for biases and tried to understand the true opinion expressed by a population most representing the actual voters themselves.
Algorithmically, we detected terms that are pro-independence and compared the aggregate sentiment against phrases which are pro-union to arrive at a prediction for whether the referendum would have a “yes” or “no” vote. Our sentiment detector also identified neutral keywords which potentially capture the segment of undecided voters. Information was continually ingested, with the prediction updated regularly, to best capture the latest “yes” or “no” vote. This mainly involved determining which way a percentage of undecided voters were going at the last minute.
We have had great success with previous projections. It’s good news that our test for this unique event has proven that our machine-learning model can make good predictions when analyzing data from a wide variety of sources, even in such a special circumstance. When shows like X Factor, events like the World Cup, or other elections have much more historical and broader data to work with, it provides even more ground for us to base our predictions on. So watch this space….