Cloud-Based Computing System Helps Scientists Study the Breathing of the Biosphere

BERKELEY, Calif. — Oct. 12, 2010 — Studying the environment would be simple if it weren’t for one thing: Even an isolated ecosystem is unbelievably complicated. Factors to study include water systems, plant life cycles, carbon dioxide fluctuations, resource use by humans, and far more — and each can be studied at the scale of a plant or of the planet, and measured in an instant or over decades.



MODISAzure lets computations scale from days and locales up to decades and continents. Here, one image shows a year of U.S. evapotranspiration data. The system gives scientists, politicians and others a tool to quantify the ecological services of the land to offset carbon emissions by fossil fuels.

“In trying to study the environment we’re essentially trying to paint a picture by numbers, and that picture is a movie. Every day and every second it flips to another scene,” says Dennis Baldocchi, professor of biometeorology at the University of California, Berkeley. Today, he and fellow scientists are announcing a research tool that simplifies that data analysis, enabling researchers to focus on what Baldocchi calls “the breathing of the biosphere.”

At the seventh annual Microsoft Research eScience Workshop at UC Berkeley, Microsoft Research showcased an environmental project that offers researchers data resources for detailed climate science study. The system combines state-of-the-art biophysical modeling with a rich cloud-based dataset of satellite imagery and ground-based sensor data to support carbon-climate science synthesis analysis on a global scale. This approach enables scientists from different disciplines to share data and algorithms, helping them better understand and visualize how ecosystems behave as climate change occurs.

The system was developed by Baldocchi; Youngryel Ryu, post-doctoral researcher in biometeorology at Harvard University; and Catharine van Ingen, partner architect, Microsoft Research. Over the course of the project, Baldocchi, Ryu and van Ingen have been working with researchers at UC Berkeley, Lawrence Berkeley National Laboratory, the University of Virginia and Indiana University.

Data is collected from satellite imagery and from more than 500 FLUXNET towers, which form a global network of field-based sensor arrays that measure fluctuations (or “fluxes”) of carbon dioxide and water vapor. The data can be analyzed in fine detail — down to a single kilometer — or on a global scale. Data can also be scaled by time, from the immediate picture measured over a satellite’s five-minute sweep of a defined area to the complex changes tracked over a decade or more.

“You see more different things when you can look big and look small,” van Ingen says. “The ability to have that kind of living, breathing dataset ready for science is exciting. You learn more and different things at each scale.”

For example, in the Sacramento River delta in California, the largest estuary on the U.S. Pacific coast, decisions about water use must take into account a vast number of environmental factors. How much water must be kept flowing into San Francisco Bay to keep salt water out of the estuary? How much can be transported to the San Joaquin Valley to provide irrigation for agriculture? How have the water supply and systems changed over the past decade of urban development? More accurate and detailed information can help environmental managers and engineers make better decisions.

“Our goal is to provide high-resolution spatial and temporal information and be able to give people granular information so they can diagnose what’s going on and do a better job of predicting,” Baldocchi says.

The system is based on MODISAzure, a pipeline for downloading, processing and reducing diverse satellite imagery. It uses the Windows Azure platform to deliver the results of massive cloud computational power to the desktops of researchers without requiring the investment and maintenance of a supercomputing system. The Berkeley researchers have used 250 CPU hours to date — a minimal amount by the standards of supercomputing, but a significant amount of time for individual researchers with only laptop or desktop systems.

“It’s computing for the rest of us. Not everybody can get access to a supercomputer. Scientists don’t want to fool around with the machines; they want to do science,” van Ingen says.

With the cloud-based system, van Ingen explains, researchers are able to focus on scientific questions without having to worry about the cost or logistical issues of managing computers. As the importance of the cost of computing power fades, the most important resources to manage are the time and expertise of the people doing the research.

“How would it be if computers were free and people were your only cost?” van Ingen asks. “You would do a number of things — some subtle, some huge — differently.”

With the availability of the FLUXNET and satellite data and the ability to perform complex analysis using the MODISAzure-based system, the possibilities for research are immense.

“We really are just at the beginning of understanding how the results can be used, and I think that’s very exciting,” Baldocchi says. “It’s enabling things we had never thought of before.”

Baldocchi notes that other researchers are studying issues including global photosynthesis, vineyard water management and arboreal forest cycles.

“I really hope we’ll get collaborators who take this in places we don’t understand and come back and tell us about it,” he says. “That’s the fun part of science. It takes you where you don’t expect, and that continues to motivate us.”

Baldocchi will describe the system in a presentation titled “Scaling Information on ‘Biosphere Breathing’ from Chloroplast to the Globe” today at the Microsoft Research eScience Workshop. The event is a cross-disciplinary workshop that brings together scientists from diverse research disciplines to share their research and discuss how computing is transforming their work.