Everything you are — the color of your eyes, your height, the tone of your skin, whether your hair is thick or thin — is determined, or strongly influenced, by a single molecule: the famous double helix called DNA. All of the glorious diversity that is the human race is derived, ultimately, from those two strands of material tightly wound around one another.
But sometimes things go wrong. Many horrific diseases, like breast cancer, Huntington’s disease and leukemia, are caused by genetic defects – “mistakes” somewhere in the 6 billion pairs of DNA chemical compounds. But where? In this enormous, unique “database” we all possess, what is “normal” and what is a disease-causing mutation?
To answer these questions, we need two things: a huge sample set of human DNA in a form that we can “sequence” (that is, enumerate each and every one of the six billion “base pairs”) and massive amounts of computing and storage space to do our computations.
A quintessential big-data challenge
In 2016, the White House announced the Precision Medicine Initiative (PMI), part of which involves collecting DNA from a statistically representative sample of one million Americans, and this effort is well underway.
A fully sequenced genome (the entire DNA molecule) takes up about 100 gigabytes. A million of these requires storage capacities approaching the exabyte range (that’s 10 followed by 17 zeros)! Put a different way, with the proliferation of sequencing machines from biotech giants like Illumina and others, hospitals and research institutions are now generating some three petabytes (10 followed by 14 zeros) in a single month – about the same amount of data as Netflix’s entire video library.
Discovering which genomic variations are responsible for which conditions takes time, and is an arduous task (here’s an example). It is a quintessential big data problem, which is why researchers are turning to the cloud, for storage and computing power.
With all the technical capability in the cloud, scientists and engineers are now racing to make sequencing and other analyses faster. In 2000, when the first human genome was sequenced, the price tag was around $100 million and it took nine months to complete the operation. Today, the cost for the full genome is around $1,000 and it can be done in about a day.
In another example, Microsoft and the world-renowned Broad Institute in Cambridge, Massachusetts, have shown that it is possible to reduce sequence time by as much as 85 percent, to just a few hours. As Microsoft scientist David Heckerman says, this can make all the difference to a premature newborn with a life-threatening birth defect, whose life may depend on the results of the sequencing.
And there is an even greater promise on the horizon, one that will be enabled by cloud genomics. Today, most drugs are formulated to treat the majority of patients with a particular disease. But what if doctors and drug companies could create a treatment based on your specific DNA? That would not only provide more effective treatments, but could dramatically reduce health care costs because patients wouldn’t have to find the right medications through trial and error, but could get the right treatment the first time around.
AI will play a key role
Key to the development of personalized treatments is the burgeoning field of artificial intelligence, and we’re already seeing results. Microsoft researcher Antonio Criminisi, working with Dr. Rajesh Jena, a neuro oncologist at the University of Cambridge Cancer Centre in the United Kingdom, has developed an AI-based tool called “Inner Eye” that provides astounding 3D visualizations of organs – and tumors – from standard magnetic resonance imaging (MRI) images.
And AI is finding many other applications in health care. The French company RM ingénierie uses the Microsoft Kinect device (originally developed for gaming and tracking human movements with advanced algorithms) to aid patients in physical therapy. By analyzing over 600 variables, Carolinas Healthcare applied 125,000 interventions to reduce readmissions. And in Brazil, Epimed Solutions used similar techniques to reduce hospital infections by 20 percent.
Dr. Leroy Hood, a distinguished biologist at the University of Washington and the inventor of the DNA sequencer in 1986, advocates “P-4” medicine: predictive, preventive, personalized and participatory. With cloud-based, AI genetic analysis, we have taken the first early steps toward the first goal, predicting serious diseases, and in some cases have even taken actions to delay or prevent them.
Larry Smarr, physicist, computer scientist and founder of the CALIT2 Institute at the University of California San Diego, has for years meticulously tracked the measurements of his own body: regular blood tests to track over 60 biochemical markers, detailed records of his diet, even having his own DNA sequenced in a project he calls the “quantified self.” (He once famously took the results of a CAT scan, reformatted the data and printed a 3D model of his colon!) As a result, he predicted the onset of a bowel disease prior to symptoms appearing.
The promise of better health through innovation
Cancer has been called the “emperor of all maladies,” and every one of us knows someone who has been affected by it. Many of us have seen patients try one therapy, only to find that what works on others doesn’t on them. But one day soon, new cloud and AI technologies promise to ameliorate, or even eliminate some forms of cancer, and it’s just one of many previously intractable diseases that scientists will tackle with technology.
Imagine a future where your doctor takes a swab of saliva, sends it out for analysis and a few days later you have a drug that will work for you. That day is not here yet. But perhaps, with the formidable array of technology that includes the cloud, powerful artificial intelligence capabilities, and the masses of data that we are collecting, it will come.
A 40-year veteran of the software industry, Barry Briggs previously served as CTO for Microsoft’s own IT organization, where he helped lead the company’s transition to the cloud. This column was informed by the great work being conducted by Microsoft Genomics.