REDMOND, Wash., June 13, 2007 — The source code for a set of software tools developed by Microsoft Research to advance AIDS vaccine research and development is available for download starting today from Microsoft’s CodePlex Web site. By sharing the code openly and at no charge with the worldwide AIDS research community, Microsoft hopes to spur other scientists and researchers to take up the tools and even build on them, thereby speeding the way toward a vaccine.
The code for four software tools is available now at no charge via CodePlex, an online portal created in 2006 to foster collaborative software development projects and host shared source code as part of Microsoft’s Shared Source Initiative. The tools and source code are an initial piece of Microsoft’s technical computing effort – a company-wide initiative to collaborate with the worldwide scientific community by reducing the time to new scientific insights and breakthroughs by furthering the state of information technology in scientific research.
The software tools are designed to help AIDS researchers around the globe harness the power of computing to more quickly identify the crucial elements of an effective cellular vaccine.
“We apply technology to some of the world’s toughest technical and societal challenges,” says David Heckerman, who devotes himself to vaccine work as the lead researcher of the Machine Learning and Applied Statistics Group at Microsoft Research. “And with 10,000 people per day dying of AIDS, this world health crisis is certainly one of those challenges.”
Heckerman points out that good drug treatments are available today to combat the AIDS pandemic, allowing people who contract the disease to live a fairly normal life. But existing anti-viral drugs are expensive, and if a person misses a few doses, the virus that causes AIDS can become unstoppable. That fact keeps drugs from being an effective solution for developing countries where access and cost are barriers, Heckerman explains, and it’s what’s driving keen interest in a vaccine.
Microsoft researchers began their pioneering work in AIDS research in February 2005 after discovering that their machine-learning technology, including methods for spam filtering, had compelling applications for vaccine research. The vaccine work combines technologies such as graphical models and other machine-learning techniques to comb through thousands of strains of human immunodeficiency virus (HIV) — the virus that causes AIDS — to find the genetic patterns necessary to train a patient’s immune system to fight the virus.
The two-plus years of AIDS research since then has involved approximately a dozen Microsoft researchers, post-doctoral candidates and interns, all working together in Microsoft’s labs. The effort has also encompassed extensive collaboration with doctors, scientists and other HIV researchers around the world.
One such researcher is Dr. Bruce Walker, director of the Partners AIDS Research Center at Massachusetts General Hospital in Boston, who is a Howard Hughes Medical Institute investigator. He and his research team worked with the Machine Learning and Applied Statistics Group at Microsoft Research to define how the virus is changing as the immune system attacks it. This information is critical to protect people against HIV as this virus continues to rapidly evolve and change to escape detection by the immune system.
Dr. James Mullins, a professor in the University of Washington’s Department of Microbiology in Seattle, Wash., is another researcher who understands the value of collaboration.
“The medical research tools developed by Microsoft prove that we can make more progress in the battle against HIV when experts in various fields pool their resources and work together,” Mullins says. “Our work with Microsoft Research to combine biological and computer sciences has already been very productive in moving our vaccine design efforts forward. I am quite certain that the tools that have and are being developed at Microsoft have far more exciting potential for closing in on the designs that will most likely bring success.”
Two Options for Scientists: Source Code and Web Tools
The work pursued by Microsoft Research and its collaborators has led to the development of a number of software tools designed to help further HIV vaccine research.
“Four tools derived from this effort have matured to the point where their code is ready to be shared with the overall AIDS research community,” says Carl Kadie, research software design engineer and lead programmer for the project at Microsoft Research. “And we have other tools in the works, such as those that will help AIDS researchers track the evolution of the HIV virus in individuals over time.”
HIV/AIDS researchers who access the four tools at CodePlex have two choices. They can download pre-compiled programs and run those programs on their own computers, an option that gives them complete control, lets them use all of their own computing resources, and gives them access to the full functionality of the programs. Or, they can download the source code and compile the applications themselves, an option that allows scientists to modify and build on the code so they can further optimize the tools for their own needs in vaccine work.
The tools enable researchers to sift through extremely complex immune and viral genetic information from a population of infected people to pinpoint the key strings of amino acids, called epitopes, that must be present in a vaccine for it to be effective. For the benefit of HIV/AIDS researchers who prefer to make direct, immediate use of these tools (rather than download the programs or source code), Microsoft is also posting the actual tools at its microsoft.com Web site. The Web tools can be used at no charge there, separately or in conjunction. Kadie explains that Microsoft Research modified the functionality of the Web versions of the applications to help ensure that they run fast on a shared resource (the Web server).
The Web versions enable scientists to go online at will, input the data they have collected and submit it for analysis and findings. Many researchers in the AIDS community have already taken advantage of the tools and used the results to advance their research.
“We have examined HIV sequences and genetic information from thousands of patients around the world with Microsoft Research to discover new epitopes and work out which ones are most important to include in a vaccine,” explains Professor Simon Mallal, director of the Institute of Immunology and Infectious Diseases at Murdoch University in Perth, Australia.
Mallal and his colleagues began working with Microsoft Research after publishing a research article in Science magazine describing the way that HIV genetic sequences mutated according to the genetic type of HIV-infected patients in Western Australia. He notes that Microsoft Research is making a tremendous contribution to HIV vaccine design and the broader research community.
“Microsoft has a culture of taking on the most important challenges, no matter how difficult they may seem at the outset,” Mallal says. “For this reason, they have attracted many of the best people in the world and accumulated unprecedented know-how and tools, much of which has yet to be applied to other domains. We were confident that they would be able to help, but quite frankly I have been astounded at what has been achieved in such a short time and by the generosity of the organization and everyone involved.”
Tools Demonstrate Progress in the Fight Against AIDS
Microsoft Research hopes that the four software tools released today will help the worldwide scientific community take new strides toward an AIDS vaccine. The tools reflect an approach that differs from the traditional vaccine work prompted by the discovery of HIV, which has focused on the humoral arm of the immune system. The humoral arm makes antibodies, which can eliminate a virus before it infects the cell. However, that approach has not yet proved successful in fighting AIDS, and more recently scientists have turned to the cellular immune system as a way to stop HIV.
“How effective our cellular immune system is against HIV has been a fairly controversial issue,” Heckerman says. “These tools will help answer that question.”
One of the four software tools, named PhyloD for its ability to incorporate a phylogenetic or evolutionary tree into its analysis of patterns, goes about answering that question by looking for correlations between a person’s human leukocyte antigen (HLA) system — a fundamental part of the body’s cellular immune system — and the virus that infected that person. In this case, the virus is HIV, but Microsoft researchers recognize that the PhyloD tool also has implications for the study of vaccine design related to other diseases caused by rapidly mutating viruses, such as Hepatitis C.
Jonathan Carlson, a graduate intern in Heckerman’s lab at Microsoft Research, explains the premise behind the tool this way: Pathogens live and reproduce inside a human body, whose immune system continually tries to rid itself of these pathogens. This leads to a tug-of-war wherein the pathogen tries to adapt so as to “escape” the immune system, while the immune system learns to recognize and eliminate new foreign pathogens. The key players for the immune system are the HLA proteins, each of which can recognize epitopes presented on infected cells, then alert the immune system to their presence.
For rapidly evolving pathogens such as HIV, a key defense mechanism is to evolve mutations that prevent the HLA proteins from recognizing the viral DNA. This evolution takes place anew in each patient, because each patient has a different set of HLA proteins that recognize different epitopes.
“PhyloD is a statistical tool designed to identify HIV mutations that defeat the function of the HLA proteins in certain patients, thereby allowing the virus to escape elimination by the immune system,” Carlson says. “By applying this tool to large studies of infected patients, researchers can now start decoding the complex rules that govern the HIV mutations, with the hope of one day creating a vaccine to which the virus can’t develop resistance.”
The implications for AIDS research are considered groundbreaking enough that a paper including results generated by the PhyloD tool was published in the March 16, 2007, issue of Science, a prestigious weekly journal read by the international scientific community.
Microsoft researchers have also developed and released the source code for an Epitope Prediction tool, which uses a machine-learning method related to Microsoft’s spam-filtering technology to scan proteins for likely epitopes in people with any HLA type.
A third software tool developed by Microsoft Research, called an HLA Assignment tool, aims at finding epitopes more accurately. Whereas the Epitope Prediction tool takes a pure machine-learning approach to identifying epitopes, the HLA Assignment tool also takes external biological evidence into account.
“You can perform lab studies as a way of finding epitopes, but it can become a complex problem if you see a reaction in a patient and you don’t know which of the patient’s HLA proteins are responsible for the reaction,” Heckerman explains. “Our HLA Assignment tool takes lab data from a series of patients and, in effect, solves a jigsaw puzzle to figure out which HLA proteins are responsible for the reaction.”
“The algorithms developed at Microsoft Research have provided us with incredibly valuable tools which, especially in combination, have allowed for an in-depth analysis of our data sets in much more detail than what we would have hoped for,” adds Dr. Christian Brander, of Partners AIDS Research Center at Massachusetts General Hospital. “These tools are also particularly helpful when analyzing data sets such as the ones we have generated in patient cohorts in South Africa, Peru and other international sites, where the human population genetics and its impact on HIV evolution and vaccine design are less well understood.”
The fourth software tool released today — HLA Completion — is designed to help scientists get more research out of the same dollar by addressing the hierarchy of the immune system’s HLA types. For example, within HLA A02, a two-digit type, are more specific four-digit types — HLA A0201, A0202, A0203 and so on — that describe a person’s genetic information in more granular detail. But determining a person’s HLA type to four-digit resolution in a lab is very expensive, and most HIV researchers can’t afford it, even though that information can be vital to vaccine design research.
To date, HIV researchers have often used lower-resolution HLA typing, Heckerman adds. But the HLA Completion tool developed by Microsoft Research helps extract more information out of this data. The tool works to complete lower-resolution HLA types to the likely four-digit expansions, giving researchers more information for roughly the same price.
Using Microsoft Research Tools to Advance Vaccine Research
Heckerman says the statistical analyses generated by these tools will better equip AIDS researchers to formulate hypotheses including how HIV is evolving, how vulnerable it is to our immune system, and where the epitopes in HIV might be. Taken together, this information will help researchers construct the actual HIV vaccine.
He notes that a cellular type of HIV vaccine is especially promising because it has the potential to help people already infected with the virus, as well as being an effective way to prevent others from being infected with HIV (and perhaps eventually contracting AIDS). However, Heckerman says, the vaccine that ultimately does the trick against HIV may well be some combination of an antibody vaccine and cellular vaccine.
“Although much work remains to be done, the software tools we’ve created will help move us down that road and make headway in the fight against AIDS,” Heckerman says. “As just one of several approaches to HIV vaccine design that we pursue at Microsoft Research, we hope our effort to apply a more rigorous statistical approach to this work will expedite scientific insight and bring us a step closer to developing a vaccine.”