REDMOND, Wash. — Feb. 4, 2010 — Scientists today are faced with an enormous amount of data from a variety of sources.
This great reservoir of information has changed the way research is done and led to exciting new discoveries, but it has also highlighted the limitations of desktop computers when it comes to crunching big numbers.
According to Microsoft Research’s Dan Reed, the cloud may be the sweet spot for many scientists.
With too much data for PCs to handle efficiently, say experts, researchers may have to wait months for computations to run. They may be reluctant or unable to pose more difficult questions. And this barrier has become a real limitation for many of the world’s scholars.
Dan Reed, corporate vice president of Microsoft’s Technology Policy and Strategy and eXtreme Computing Group, says there is an entire population of scientists today who are caught between the limitations of desktop computing and the expense and complexity of managing their own infrastructure to scale up their computational and data analysis ability.
For them, he says, the answer may lie in the “client plus cloud.” The emerging model of cloud computing holds much promise to provide the computing power scientists need, without forcing them to invest in and maintain their own systems.
“There is a large community of researchers — social scientists, life scientists, physicists —running many computations on massive amounts of data,” Reed says. “To use an example many people can understand — how can we enable researchers to run an Excel spreadsheet that involves billions of rows and columns, and takes thousands of hours to compute, but still give them the answer in 10 minutes, and maintain the desktop experience? Client plus cloud computing offers that kind of sweet spot.”
Economies of Scale
Computational research may conjure up images of huge supercomputers, the technology world’s most rarified asset. But according to David Lifka, director of the Cornell University Center for Advanced Computing, there are a limited number of scientists who have an application that needs to run on thousands of cores at the same time, or analyze petabytes of data.
“In the life sciences, however, you often have a group of researchers with a multitude of small jobs that are critical to their work, but those jobs do not require the massive scalability of the world’s biggest computers,” says Lifka. “That kind of resource is overkill for them.”
As a former director of the U.S. Office of Cyberinfrastructure at the National Science Foundation (NSF), Daniel Atkins has been intimately involved with the NSF’s collaboration with industry and academia. Now Kellogg professor of Community Information at the University of Michigan, Atkins says departments in need of computing power often resort to what are disparagingly called “huggable” clusters — lashing together commercially available machines to make parallel systems, and placing them in their offices or closets or ad hoc machine rooms.
“Many universities have literally hundreds of small machine rooms, and increasingly that’s being looked at as a bad idea,” says Atkins. “It’s not very environmentally friendly. Management is very ad hoc, and people lose data.”
It’s a lot of work, expense and trouble — especially when you consider that most researchers only need to use their datacenter once in a while. Most scientists would prefer to spend their hard-earned grant dollars elsewhere.
David Patterson, a professor of computer science at the University of California, Berkeley, is another authority on parallel processing and cloud computing. Because of the cost and difficulty involved, even in building “huggable clusters” capable of running applications in parallel, Patterson says many scientists don’t even bother trying to build one. Instead, they run a set of desktop computers 24 hours a day, seven days a week, slowly wringing out answers over weeks and months — the proverbial dial-up modem of computational research.
All three experts say cloud-based computing may be an ideal way to empower those who fall into the broad category of scientists that need very powerful computational ability, but only need it some of the time.
“With cloud computing there’s no reason to wait,” Patterson says. “It doesn’t cost us any more to use a thousand computers for an hour than it does to use one computer for a thousand hours. So we get the answers tomorrow at no extra cost rather than waiting for the answers for six months.”
Lifka and Atkins point out that there are enormous economies of scale to be reaped by creating community access to aggregate datacenters, which is one of the prime motivators for cloud computing.
Berkeley’s David Patterson compares cloud computing’s potential impact to the invention of the microprocessor.
The approach allows scholars to essentially rent a community resource that can scale as necessary, yet works directly through the desktop they are already using. Applications use only the processing power they need, on demand, and the rest of the time the research group doesn’t have to worry about maintenance, power, cooling or everything else that comes with owning a cluster.
Patterson adds that the economies of scale possible with the cloud are just as much about performance as cost. The most exciting part of cloud computing, he says, is the ability to “scale up” the processing power dedicated to a task in an instant.
“This is the first time in history that there is scalability without additional overhead,” he says. “Usually if you want something a thousand times as big, you have to pay more than a thousand times as much, but not in cloud computing.”
Democratizing Research
Patterson’s point is a compelling proposition for most fields of inquiry. In almost any discipline today, computational research means either a lot of computation or a lot of data, or both. Cloud computing allows the scalability to handle both at rates that are affordable to academics.
It’s a combination that Atkins says the scientific community has been searching for: “The cloud offers research computing for everyone, which is very much needed. It’s targeted at people who have large amounts of data that they want to process and extract knowledge from, but who do not need or cannot afford their own datacenters.”
Further fueling this “democratization” of access to research computing is the fact that, by combining computing power and data in a single location, cloud computing can facilitate both security and control, as well as collaboration across great distances — a feat Patterson says has been difficult due to the sheer size and scale of today’s datasets, and the corresponding expense of moving them from one network to another.
“When you have massive amounts of data, it starts becoming impractical to move it, so it can be an advantage to have it stored centrally,” Patterson says. “But it’s an even greater win to be able to cooperate with people across many institutions that have a common dataset in the cloud.”
For now, Atkins stresses that there is much work to be done as the cloud ecosystem matures to support academic research more completely. Support and licensing issues for popular research computing software is an important piece, as well as creating the right tools to extend popular research software such as MATLAB and Excel to be able to exploit cloud computing resources.
“The NSF is working with companies across the technology industry, and we are all expecting to get much deeper knowledge about the issues surrounding cloud computing and research over the coming months,” Atkins says. “The cloud is already starting to be embraced in the enterprise environment, and the industry is working with academia to gain deeper insight into what role the cloud can play in research computing.”
Despite the uncertainties inherent in any new technology, however, Berkeley’s Patterson believes cloud computing is here to stay.
“This is a transformative technology like the invention of the microprocessor,” Patterson says. “I believe, for the rest of this decade, we’re going to watch this wave of technology transform our industry, as well as create opportunities for scientists and educators. Once it’s done, the IT world is going to be a different place.”