Microsoft Research Delivers Tools to Help Accelerate Scientific Discovery

REDMOND, Wash. — July 13, 2009 — Addressing an audience of prominent academic researchers today at the 10th annual Microsoft Research Faculty Summit, Microsoft External Research Corporate Vice President Tony Hey announced that Microsoft Corp. has developed new software tools with the potential to transform the way much scientific research is done. Project Trident: A Scientific Workflow Workbench allows scientists to easily work with large volumes of data, and the specialized new programs Dryad and DryadLINQ facilitate the use of high-performance computing.

Created as part of the company’s ongoing efforts to advance the state of the art in science and help address world-scale challenges, the new tools are designed to make it easier for scientists to ingest and make sense of data, get answers to questions at a rate not previously possible, and ultimately accelerate the pace of achieving critical breakthrough discoveries. Scientists in data-intensive fields such as oceanography, astronomy, environmental science and medical research can now use these tools to manage, integrate and visualize volumes of information. The tools are available as no-cost downloads to academic researchers and scientists at http://research.microsoft.com/en-us/collaboration/tools.

“Today, scientists can collect more data than ever before from the Internet, satellites, sensors and other resources,” Hey said. “That deluge of information brings amazing research opportunities, but at the same time, our ability to process that data and make it meaningful has not kept pace. These tools help simplify the data-intensive end of research, so scientists can focus on analyzing results and making new discoveries.”

Transforming a Discipline

Project Trident is allowing oceanographic researchers to plan for managing the massive amounts of scientific data that will be coming in from sensors, instruments, moorings, robots and cameras attached to fiber-optic cables on the ocean floor. The data will be used to better understand sediment flows, changes in temperature and salinity, earthquakes, undersea volcanoes, extreme life forms associated with seafloor hydrothermal vents, and how to improve predictions of tsunamis.



Microsoft Research’s Project Trident is helping scientists to manage data-intensive projects such as the Ocean Observatories Initiative, which is creating cabled observatories off the U.S. coast. Photo courtesy of Center for Environmental Visualization, University of Washington.

Project Trident is currently being used by oceanographers at the University of Washington to support the Ocean Observatories Initiative (OOI), a seafloor-based, soon-to-be-constructed research infrastructure sponsored by the National Science Foundation which will place thousands of sensors in the oceans of the Western Hemisphere.

The amount of data coming in from these sensors will be roughly equal to two simultaneous high-definition TV broadcasts going around the clock.

Project Trident is also being used by oceanographers at the Monterey Bay Aquarium Research Institute to support a data portal for a program funded by the Office of Naval Research designed to better understand typhoon intensification.

“In the ocean sciences we routinely work with complex multidisciplinary data sets, and the investigator often spends more time on the mechanics of finding and manipulating data than on the process of understanding what the data means,” said James G. Bellingham, chief technologist, Monterey Bay Aquarium Research Institute. “Trident’s workflow framework provides a graphical environment that hides much of the complexity from the user, letting scientists focus their intellectual energy on the data rather than the software.”

In addition, astronomers at Johns Hopkins University are using Project Trident to support the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) project, which helps detect objects in the solar system that might pose a threat to Earth. The Pan-STARRS project uses an array of very powerful digital cameras to observe the entire night sky several times each month. Each of the cameras captures 1.4 gigapixels — 200 times the resolution of a 7-megapixel consumer camera.

“This is an amount of raw data so large it’s difficult to comprehend, much less work with,” said Alex Szalay, Alumni Centennial Professor at Johns Hopkins University. “With Project Trident, we can essentially digest that tremendous data source directly into our supercomputers customized for data-intensive science, process it interactively and create complex statistical analyses to help us better understand what’s going on in the universe.”



Astronomers at Johns Hopkins University are using Project Trident to support the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) project, which helps detect objects in the solar system that might pose a threat to Earth. The Pan-STARRS project uses an array of powerful digital cameras to observe the entire night sky several times each month. Each of the cameras captures 1.4 gigapixels — 200 times the resolution of a 7-megapixel consumer camera. Photo courtesy of Pan-STARRS project.

Harnessing Technology for Science

Project Trident was developed by Microsoft Research’s External Research Division specifically to support the scientific community. Project Trident is implemented on top of Microsoft’s Windows Workflow Foundation, using the existing functionality of a commercial workflow engine based on Microsoft SQL Server and Windows HPC Server cluster technologies. DryadLINQ is a combination of the Dryad infrastructure for running parallel systems, developed in the Microsoft Research Silicon Valley lab, and the Language-Integrated Query (LINQ) extensions to the C# programming language. Dryad was designed to simplify the task of implementing distributed applications on clusters of Windows-based computers. DryadLINQ is an abstraction layer, which simplifies the process of implementing Dryad-based applications.

The DryadLINQ system automatically and transparently translates and executes the queries on large compute clusters using the Dryad execution engine. A DryadLINQ program can be written and debugged using standard .NET development tools, and it makes distributed computing on large clusters simple for most programmers.

Reducing Research Overhead

Project Trident combines gaming graphics with workflow technologies to create a powerful visualization tool that makes large-scale, complex scientific data not only easy to review and analyze, but also easy to manage, reproduce and share. It enables researchers to build experiments that formerly required heavy involvement from computer scientists. To give the solution enough “horsepower” to process very large data sets, Dryad and DryadLINQ allow Project Trident to be run on distributed systems or large compute clusters.

“With the addition of DryadLINQ, our ability to interpret data has finally caught up with our ability to collect it,” said Roger Barga, a Microsoft researcher and principal architect for the new tools. “While it is not necessary to couple Project Trident with Dryad, the combination provides a powerful system for processing very large volumes of data.”

The marriage of visualization and workflow technologies allows data analysis experiments to be developed visually as “workflows,” similar to process workflows used in the business world. Whereas building such a system has traditionally required custom coding and weeks or months of development time, with Project Trident, senior researchers can do much of that upfront programming themselves in just hours or days.

About Microsoft External Research

The External Research Division of Microsoft Research builds relationships between academia, industry and government to help advance research in fields that rely heavily upon advanced computing. Microsoft Research provides the tools, technologies, resources and interoperability needed to accelerate research and advance human potential and the well-being of the planet.

About Microsoft Research

Founded in 1991, Microsoft Research is dedicated to conducting both basic and applied research in computer science and software engineering. Its goals are to enhance the user experience on computing devices, reduce the cost of writing and maintaining software, and invent novel computing technologies. Researchers focus on more than 55 areas of computing and collaborate with leading academic, government and industry researchers to advance the state of the art in such areas as graphics, speech recognition, user-interface research, natural language processing, programming tools and methodologies, operating systems and networking, and the mathematical sciences. Microsoft Research currently employs more than 850 people in six labs located in Redmond, Wash.; Cambridge, Mass.; Silicon Valley, Calif.; Cambridge, England; Beijing, China; and Bangalore, India. Microsoft Research collaborates openly with colleges and universities worldwide to enhance the teaching and learning experience, inspire technological innovation, and broadly advance the field of computer science. More information can be found at http://research.microsoft.com.

About Microsoft

Founded in 1975, Microsoft (Nasdaq “MSFT”) is the worldwide leader in software, services and solutions that help people and businesses realize their full potential.

Note to editors: If you are interested in viewing additional information on Microsoft, please visit the Microsoft Web page at http://www.microsoft.com/presspass on Microsoft’s corporate information pages. Web links, telephone numbers and titles were correct at time of publication, but may since have changed. For additional assistance, journalists and analysts may contact Microsoft’s Rapid Response Team or other appropriate contacts listed at http://www.microsoft.com/presspass/contactpr.mspx.

Related Posts