By Rob Knies, Managing Editor, Microsoft Research
Originally published in the Inside Microsoft Research Blog
When the world starts watching, it’s time for David Rothschild to shift into overdrive.
Readers of this blog need little introduction to the work on prediction models from Rothschild, a Microsoft researcher and economist. Past posts have examined his efforts to produce accurate forecasts for events such as the 2012 U.S. presidential election, the 2013 and 2014 Academy Awards, the 2014 NCAA men’s basketball tournament, and, recently, India’s general election.
So, what’s the next big thing to which he can apply his prognosticative powers?
The World Cup, of course.
The premier international event for the sport known as “soccer” in the United States and “football” elsewhere, the World Cup has become one of the most viewed sporting events on Earth. Hosted this year by Brazil—the nation with the most championships, five, including two of the last five, and the only team to play in every tournament since the event’s 1930 debut—competition in the 64-game event will be played by 32 men’s national teams across 12 Brazilian cities, beginning with a Brazil-versus-Croatia match on June 12 and culminating with the Finals, July 13 at Estádio do Maracanã in Rio de Janeiro.
Rothschild will be watching intently, though his interest might differ a bit from the nationalistic fervor of most the game’s other adherents. For him, this is simply the latest step in an ongoing attempt to fine-tune increasingly accurate prediction models that can applied to domains far afield from the pitch.
Take a look at some of his predictions!
“Sports are extremely predictable,” he explains, “but the World Cup is much more idiosyncratic. It’s more like politics in that way. We know a lot about how a generic Brazil team would do against a generic Croatia team, similar to the way I know how a generic Republican candidate will do against a generic Democratic candidate. Yet it is a whole lot less certain than how the New York Yankees are going to do against the Seattle Mariners with 60 baseball games of data already in the books.
“That being said, over time, we learn more and more ways to get the data we need to answer the idiosyncratic events in a purely data-driven way.”
Right now, Rothschild’s model bears a striking resemblance to that of Betfair, a U.K. betting exchange that supplies some of the data that fuel Rothschild’s models. Both currently have Brazil with the highest likelihood of success, a reasonable assumption given the host country’s history of success.
That, though, could begin to change once the action begins.
“I have created a full model,” Rothschild explains, “but I rely heavily on the prediction-market data. The reason is simple: The problem with pure fundamental models is that even the best fundamental models are lacking because the World Cup is an event held just once every four years, without any regular season. There is a lot of idiosyncrasy in the event that is hard to capture in historical data sets.
“Both the fundamental data and the prediction-market data will update as the World Cup progresses. The predictions will update every few minutes, and I will also show the pregame predictions for all games.”
The peculiarities unique to the World Cup in the sporting sense serve only to help bolster and extend Rothschild’s models.
“Normally,” he explains, “sports playoffs do not actually update the predictions that much. There is a long regular season, and the way a team plays in any given game of the playoffs is not providing too much new, meaningful information. That is not the case in the World Cup, which lacks a regular season, so each match tells me a lot, and the long duration of the event means I am making serious updates after every match, not to mention during the match.
“The international flavor is fun, but it does not change the method: providing accurate, quantifiable, and updated statistics to the right question.”
While sports can provide diversion for millions and deliver improbable drama, such as in 1990, when the unheralded Indomitable Lions of Cameroon stunned the world with an opening-round victory over defending champion Argentina, for Rothschild, the ultimate result represents simply another step toward his goal of using finely grained data to forecast individual and aggregated outcomes, regardless of domain. Still, such competitions are extremely useful.
“In an effort to build more generic infrastructure for data collection and analysis, sports is a huge positive,” he says. “As you consider how different a regular-season baseball game and a World Cup game are, you can appreciate how sports provides the kinds of examples we need to create domain-independent technology.
“This technology is helping us answer weightier questions than sports outcomes, covering a range of topics and data types.”