Q&A: Microsoft Researchers Use Machine Learning Techniques to Help Advance HIV Vaccine Research

REDMOND, Wash, Feb. 23, 2005 — Every 10 seconds, someone dies of AIDS. According to the World Health Organization (WHO), the worldwide toll of lives is approaching 30 million, and shows few signs of abating Nearly 5 million people are infected each year with human immunodeficiency virus (HIV), the virus that causes AIDS, and the WHO estimates that today 40 million people live with the virus — a number equal to the entire population of Spain.

Microsoft Research is working with leading doctors and scientists to use advanced computer science techniques in the fight to slow or stop the HIV/AIDS pandemic. Microsoft researchers are applying software algorithms similar to those used on computing challenges such as managing computer databases, compressing digital files or blocking spam e-mail to overcome roadblocks in the hunt for an HIV vaccine.

Laboratory tests began this month on vaccine models developed using these Microsoft Research-aided approaches. The tests are the first step in what could be years of additional research and trials to determine the effectiveness of these models and determine if they could be used to develop vaccines for hepatitis C and other mutating viruses.



From left: David Heckerman, Nebojsa Jojic, Simon Mallal, and Dr. James Mullins.

David Heckerman , lead researcher and manager of the Microsoft Research Machine Learning and Applied Statistics Group, and another researcher in this group, Nebojsa Jojic , will discuss the initial findings of this research today at the 12th annual Conference on Retroviruses and Opportunistic Infections (CROI) in Boston.

PressPass talked to the Microsoft researchers to learn more about their work and why Microsoft Research has dedicated a portion of their time — as well as that of Microsoft researchers Chris Meek and Carl Kadie and Jojic’s brother (and former intern), Vladimir — to medical research. The two main collaborators on the HIV vaccine project — Simon Mallal , professor and executive director of the Centre for Clinical Immunology and Biomedical Statistics at Royal Perth Hospital and Murdoch University in Perth, Australia, and Dr. James Mullins , professor and chair of the University of Washington Department of Microbiology in Seattle, Wash. — also took time to provide insight and background on the joint research effort.

PressPass: It’s more than 20 years since the discovery of HIV. How close is medical science to a cure?

David Heckerman: Antiviral drugs have helped suppress the virus and education has helped slow the spread of HIV in some parts of the world. But the number of people contracting HIV continues to grow. The World Health Organization estimates 14,000 additional people contract HIV every day.

Many researchers around the world are now pinning their hopes on the potential of cellular vaccines to protect against the HIV infection. Cellular vaccines train the immune system to recognize short fragments of foreign proteins, called epitopes, that are found on the surface of infected cells. Once it recognizes these cells, the immune system kills them.

PressPass: At what stage of development are these cellular vaccines?

Simon Mallal: A few vaccines are in clinical trials, but none so far have been able to overcome the primary challenge of HIV — the enormous diversity of the virus. Our tests on 473 HIV patients in Perth found 473 different strains of the virus.

As long as the virus keeps spreading, the number of strains will keep growing. Imagine if every patient with smallpox had a different version of the virus; the smallpox vaccine made from just one version may not have protected against many of the others.

PressPass: What makes the vaccine models you are developing different from others?

Mallal: We believe the key to fighting HIV is to find patterns in the way it mutates to create versions of the virus that can escape recognition by the carrier’s immune system. So far, this is easier said than done. Our research suggests that much of the enormous diversity of HIV is driven by the genetic diversity of one of the building blocks of the immune system the tiny molecules, called human leukocyte antigens (HLAs), that our bodies rely to detect invading cells. Though there are similarities within ethnic groups and populations, theres an enormous number of distinct HLA immune types.

By uncovering patterns in the way HIV mutates in different patients, we believe we will be able to more accurately predict the HIV epitopes needed to train the immune system to recognize and fight the virus. This approach should allow us to tailor vaccines to different people and populations, based on their immune type and the circulating HIV variants.

Mullins: We are taking a little different approach. We are computing the genetic ancestors of the virus that embody common features present in all present day strains. By including these “ancestral” epitopes in a vaccine, we hope to provide the immune system with the basic set of epitopes necessary to recognize different strains of the virus.

PressPass: How has Microsoft Research helped advance your research in Australia, Simon?

Mallal: Before we began collaborating with Microsoft Research, we had cataloged the complete genetic sequence of the virus and immune type for each of 250 patients with whom we’ve worked. What we needed was an efficient way to cross-reference and analyze these samples to relate the viral mutations to the immune system type of each patient. We had amassed an enormous amount of data — the largest set of HIV samples mapped to specific immune types ever collected. Traditional trial-and-error and peer research methods would have taken an enormous amount of time, and technology tools we were using were taking weeks to complete each analysis. We needed a smarter, faster solution.

Heckerman: The machine-learning and data-mining algorithms that my team uses to solve computer science problems are ideal for the challenge Simon faced. These algorithms allow Microsoft database software to uncover hidden patterns within large computer databases. We’ve employed similar algorithms in Microsoft’s e-mail services to help differentiate spam from legitimate e-mail.

The algorithms we used to analyze Simon’s data combed through the hundreds of genetic sequences that he and his research team had collected, and tested millions of different possible combinations of epitopes and immune types.

Mallal: Regardless of how successful these approaches are with HIV, they may help with the design of vaccines for other mutating viruses, such as hepatitis C. We plan to broadly share our research with colleagues who are working on treatments for these other viruses.

PressPass: How about you, Jim? How has Microsoft Research helped you and your colleagues at the University of Washington advance your research?

Mullins: We suspect the common features represented by ancestral epitopes will help trigger an immune response in many more HIV patients than a vaccine derived from any circulating strain. However, the diversity of the different strains is so great that we may need to supplement our primary approach by adding a variety of additional epitopes. Until we began working with Microsoft Research, this would have required a much larger expansion of the size of the vaccine. The larger a vaccine is, the harder it is to administer and the more costly it is to make.

Jojic: My research has focused on finding ways to condense digital images, video and audio to make them easier to segment, search and store. The miniature versions of the data — what I call epitomes — contain many or all of the important pieces of the data, but reduce the overall size by overlapping the common components of these pieces. We have used similar representations for video indexing and editing applications.

These algorithmic models allow us to condense the HIV epitopes by overlapping the shared components within the immunogen. The vaccine models we’ve developed that use this approach are more than twice as short as ones in which the epitopes don’t overlap. The epitomes take into account both the similarities and variability of the data, so they can factor in the cellular binding properties of various molecules important for immunity.

PressPass: What stage are you at in the development of an actual vaccine?

Mallal: Jim’s group began laboratory tests at the UW on samples of HIV-infected cells this month. We are about to start testing using another approach. These tests will help us determine if the algorithmic analysis identified the correct epitopes and if the epitome approach works on live vaccines such as these. We should have initial results later this year.

Heckerman: If the tests are deemed successful, my team will do additional algorithmic analysis to avoid what’s called immunodominance — a phenomenon that may reduce the effectiveness of a vaccine containing many epitopes. We have ideas about how to do this and plan to test them in the lab.

Mullins: Even if all goes well again, we would still require a few years of vaccine development and additional testing. Anyone who tells you a “cure” is right around the corner is not basing their claim on data. We are very excited about this approach, but we don’t know if it is going to work. History tells us it has a low possibility of working. Nothing has worked thus far.

PressPass: How did these collaborations begin?

Jojic: I was interested in applying my previous experience in statistical modeling of other types of natural signals to biology. When I heard about the research that Jim and his colleagues were doing at the University of Washington, I suspected we might be able to help them out.

Mallal: We heard about David’s work in machine learning and data mining from Jim, with whom we were already collaborating. Like many researchers, we typically don’t share our raw data with many other organizations. But once we talked to David and Nebojsa, we realized it was a moral imperative to begin working with them. The synergy between our areas of research was obvious from our first conversation.

PressPass: Why is Microsoft Research dedicating time and other resources to medical research?

Heckerman: While a majority of our research eventually ends up in Microsoft products, Microsoft didn’t create its research labs to focus solely on product development. We were hired to help advance the state of the art in areas where software already plays a role — or will play a role in the coming years. The sciences are one of those areas. Virtually every field of science is drowning in data. Researchers have collected this data for decades, hoping it would lead to breakthroughs or solve persistent problems. But, until recently, they haven’t had the technology and tools they need to find the answers hidden within this data.

This HIV vaccine project is of great personal interest to me because of my interest in medical science. Before I came to Microsoft Research, I studied to become a physicist and graduated from medical school at Stanford University.

PressPass: In what other areas of science, apart from medical science, has Microsoft Research done projects?

Jojic: Two great examples are in are geology and astronomy. With TerraServer, we created one of world’s largest online databases to help the U.S. Geological Survey provide free, public access to its vast store of maps and aerial photographs of the United States. Microsoft Research also teamed up with Sloan Digital Sky Survey to create SkyServer, a Web site that offers professional and amateur astronomers the ability to access and study pictures of more than 80 million stars and galaxies.

In addition to the HIV vaccine work, we are collaborating with biologists to unravel some of the gene splicing mechanisms in higher organisms. We’ve also helped create an improved model of evolution and analyzed associations between diseases and genetic variations in humans.

With another project, we are helping analyze associations between diseases and genetic variations in humans. Microsoft researchers also are helping biologists to develop languages that will help us better describe biological systems. It’s a very exciting time to be a computer scientist.

PressPass: David, could any of the research you are doing on HIV vaccines advance the state of software technology?

Heckerman: You never know. It’s extremely likely that software and other technologies developed to advance other areas of science will someday be useful in homes and businesses. It’s hard to anticipate what those technologies may be, but good things are sure to come from this work.

One thing we’ve already learned is that the algorithms we use for computer science problems appear to work even more accurately when used to predict patterns within natural, biologic systems.

PressPass: Jim, were you surprised the researchers at Microsoft Research were taking an interest in HIV research?

Mullins: Not really. We’ve been more surprised by the ability of the Microsoft folks to assimilate the biology and ask really, really pertinent questions. One doesn’t normally get that from people who are outside of our field — or from many people within our field. It has been a pleasure to work with them.