New Tools for Discovery on Display at eScience Workshop

INDIANAPOLIS — Dec. 12, 2008 — Each year Microsoft External Research hosts a conference showcasing the emerging field of “eScience,” the use of technology to facilitate scientific endeavors. The eScience Workshop attracts hundreds of scientists, researchers and academics with dozens of presentations and projects in areas as diverse as astronomy, medicine, ecology, social science and database technology.



Carole Goble (center), winner of the inaugural Jim Gray eScience Award, poses with Tony Hey (right) and Daron Green, senior director of eScience at Microsoft Research. Indianapolis, Dec. 9, 2008.

The eScience conference was especially timely this year, as significant advances in two divergent technologies are changing scientific discovery: parallel processing and cloud computing.

Breakthroughs in microchip technology have fueled both commercial and academic use of parallel processing, providing researchers with unprecedented access to statistics and analytical tools. Meanwhile, new cloud computing and distributed architectures are allowing scientists and even amateurs around the world to collaborate on today’s toughest problems, using the best available data, with access to high-end systems formerly available only to the most well-funded laboratories.

The implications, say many in the field, represent a watershed across the scientific world.

The workshop was held this week in conjunction with Indiana University and the IEEE’s International Conference on e-Science at Indiana University–Purdue University Indianapolis. In a demonstration of how far the field has come, the conference also featured a new award to acknowledge great work in eScience, the Jim Gray eScience Award, named after the legendary Microsoft researcher who made significant contributions to the field. The inaugural award was given to University of Manchester computer science professor Carole Goble.

In a keynote address at the conference, Tony Hey, corporate vice president of Microsoft External Research, discussed Goble’s work and how technology is solving a range of challenges, allowing researchers to ask formerly impossible research questions.

“As researchers we find ourselves in this ‘Fourth Paradigm’ of data-intensive science where we will be able to tackle previously intractable problems,” Hey said. “Data is being continuously collected; repositories are being made publicly available; the ability to collaborate online and compute in the cloud is becoming an increasingly accepted model. All of these are having a tremendous impact on science.”

Managing the Data Deluge



Geoffrey Fox presents his keynote address on distributed and parallel programming environments at the close of the 2008 Microsoft eScience Workshop. Indianapolis, Dec. 9, 2008.

According to Hey, new technologies are creating unprecedented volumes of data that would be impossible for human beings to sift through — sensor networks, automated laboratory instruments and observation devices, even tools such as social networking services are changing the way information is collected.

Meanwhile, new data mining, analysis and processing technologies are giving scientists the tools to dissect and analyze data like never before. And cloud-based technologies are globalizing and “democratizing” the entire process, making information available to a much wider range of experts.

In a separate keynote address Tuesday, Indiana University professor of informatics Geoffrey Fox discussed the impact of these technologies.

Said Fox: “Today we have parallel computing for the masses. A year and a half ago the word ‘cloud’ hardly appeared, but now it dominates headlines in the field. Together these tools allow not only for more detailed analysis, but also the involvement of a more diverse set of people. Between them they are certainly facilitating time to discovery.”

Another presenter at the conference, Paul Watson of the U.K.’s Newcastle University, who gave the workshop’s opening keynote address, talked about how cloud computing can dramatically shorten “time to discovery” in a number of practical ways.



Paul Watson delivers his opening day keynote address on how cloud computing will enable scientists to share, integrate and analyze data. Indianapolis, Dec. 8, 2008.

“You may have an idea for a new algorithm, but need a new server or terabytes of storage to explore it,” Watson said. “You write a proposal and six months later you finally learn whether you’re going to have the funding to even begin building the system.

“Cloud computing offers the promise of resources on demand, with great scalability. So scientists can have that bright idea, grab the resources they need from the cloud almost immediately, do the analysis, and get the results.”

The Era of eScience

According to Kristin M. Tolle, Ph.D. chair of the 2008 Microsoft eScience Workshop and senior research program manager at Microsoft External Research, that promise has come much closer to reality in the past year, as advances in the cloud and other areas have moved the field of eScience to the cusp of having a significant impact in a number of disciplines.

“Now that we have all the pieces of the puzzle, it is time to unlock the next great scientific breakthroughs,” Tolle says.



Kristin M. Tolle, Ph.D. welcomes 250 scientists, researchers and academics to the 2008 Microsoft eScience Workshop. Indianapolis, Dec. 8, 2008.

Tolle says that as a result of this community’s efforts, new eScience tools are available for biology, genomics, astronomy, physics, medicine — the list goes on, and many were described at talks during the event. Further, due to broader installation of environmental and medical sensors and real-time monitoring, scientists are practically drowning in high-quality, high-bandwidth, multimodal data.

“It is this combination of events that will fire the future of scientific discovery,” Tolle says. “The workshop was structured so that professionals in those disciplines could mingle with computer scientists. Collaboration agreements were formed, connections made, and science was advanced in an environment of camaraderie. It was a highly successful meeting.”

Throughout the week, examples of eScience breakthroughs were pervasive. In his keynote address, Hey pointed to several examples, including the “Cosmic Genome” project — the Sloan Digital Sky Survey, run by the Astrophysical Research Consortium (ARC), which includes members from several prominent universities and featured significant contributions from Gray.

The survey’s mission was to record the northern night sky. The project has obtained many terabytes of images covering 25 percent of the northern heavens, including pictures of 300 million celestial objects.

That work on the heavens has now been documented in the cloud. The work has been archived on the publicly available SkyServer, which has formed the foundation for other projects that bring astronomy to the public, such as the Galaxy Zoo and the WorldWide Telescope launched last spring.

According to Hey, the SkyServer is an example of how the traditional scientific publishing model is being turned on its head by the new distributed world of science, bringing the information to vastly more people than was possible before Web 2.0 and cloud computing began to emerge.

SkyServer has seen 380 million hits over six years, and its heavenly map has been viewed by 930,000 distinct users. Its data has been cited in more than 1,600 scientific papers, and used as the basis for an estimated 50,000 hours of lectures to high schools alone.

“We think there are about 10,000 professional astronomers worldwide,” says Hey. “So clearly the reach of SkyServer goes well beyond the academic community.”

Watson points out that such cloud-based technologies have far-reaching ramifications across professional disciplines as well.

“How about allowing a neurosurgeon in Boston to advise as a surgery is being performed in Newcastle, England?” Watson says. “With the near-real-time access to complex data that cloud computing potentially enables, these kinds of scenarios become possible.”

According to Fox, the impact of this shift extends beyond science and the professions to represent a fundamental democratization of science. He points to the rise of institutional repositories, where scientific papers, journals and other information are being made available by major academic and research institutions over the Web for free.

“This is going to change everything,” Fox says. “Journals have to change. Publication has to change, and that will change the way scientists are judged, making the entire process much more dynamic. You no longer have to be co-located with great people to work with them. You can work with them across the Internet, which means the smaller universities that historically have a huge disadvantage can now be more competitive in science.”

The Jim Gray eScience Award

Behind the cloud and its emerging potential lie years of research into database technology, Web services and processing power. With that in mind, the Jim Gray eScience Award was established this year to honor the innovators whose work truly makes science easier for scientists.

The inaugural Jim Gray eScience Award was given to Carole Goble for her work to develop Taverna, considered by many to be the gold standard in scientific workflow systems that allow various technologies to connect to analyze data. Tolle says that eScience tools such as Taverna reduce scientific overhead, allowing scientists to get to the point of analysis and discovery much more quickly.

“Workflow systems allow data to flow through multiple steps of the analysis process, so you spend less time shuffling around data and more time generating valuable information and interpreting the results,” Tolle says.

Goble was also honored for her commitment to building a community of researchers through the “My Experiment” Web site, a Facebook-style social network for researchers to share workflows and collaborate on projects.

“If you share these workflows, you’re leveraging the intelligence of the community,” Goble said after accepting her award. “Science is really a field of information — discovery, sharing and processing. And it’s now being conducted on a global scale. The world’s greatest expert could be sitting in that small research lab somewhere, and he or she now has all of this data and processing power available. So to me eScience is fundamental. This is how science is done.”

It’s a vision carried on annually in the eScience Workshop, where the infrastructure to enable scientists is showcased for attendees, many of whom are engaged in various areas of research themselves.

“During a panel on ‘What do scientists really need to facilitate time to discovery?’ we gave the floor to reference disciplines, so they can tell their side of the story in an interactive way,” Tolle says. “This workshop truly tells the end-to-end story of scientific discovery.”