Data and the cloud: By unlocking our DNA we’ve found two new prehistoric ancestors
Microsoft Azure helps scientists unravel the surprisingly diverse origins of modern people and uncover some medical answers for today
Who were our ancestors? What species of humans came before us? How much of their jumbled genetic legacy lies embedded in our DNA? And what does that mean for the health and wellbeing of people today?
Since before the time of Charles Darwin, generations of scientists have been trying to unravel the origins of humankind by digging up and poring over fossils. Now, a new generation of researchers is peering into the mists of time with more clarity and speed than ever before, but not with picks and shovels.
Instead, big data and cloud computing have led a team of international scientists to a startling discovery: Among our forebears were two previously unknown groups of prehistoric hominins.
These early or archaic humans, known collectively as Denisovans, disappeared perhaps 30,000 or so years ago but not before outlasting their better-known rivals, the Neanderthals, by many millennia. And, just like the Neanderthals, they co-existed and interbred with us, leaving a genetic inheritance that has been passed down in our DNA to this day.
These findings are not based on dusty bones, but in the blood of around 300 people living today on islands that stretch across Indonesia, New Guinea, and into the Southwest Pacific – a slice of the planet where little genetic research has been conducted before.
Just like the Neanderthals, the Denisovans co-existed and interbred with us, leaving a genetic inheritance that has been passed down in our DNA to this day.
Samples were collected on a voluntary basis and subjected to genomic sequencing that produced mountains of data that were put through two or so years of modeling and analysis in the cloud.
The research was exhaustive, and the results were unexpected: The samples were found to contain archaic Denisovan genetic material never before identified – and in significant amounts, particularly in people from New Guinea.
Previously, scientists had only known of one type of Denisovan – from bone fragments found in a mountain cave in Siberia in 2010. But this new Denisovan diversity, found thousands of kilometers away in among people in Southeast Asia, was distinctly separate.
“We identified two new groups. So now we know of three types of Denisovans,” says Prof. Murray Cox of Massey University in New Zealand, who authored the findings published last year in the scientific journal, Cell.
“They are all very different from Neanderthals – and very different from each other. What we found means that the origins of modern people are far more diverse and complex than any of us had imagined before.”
As well as adding to our understanding of how we evolved, the team’s work demonstrates how quickly scientific processes are transforming with the adoption of new digital technologies.
“I am a computational biologist. I develop the programs, the code, the algorithms, and the statistics to go into a big data set and pull the information we need to answer questions,” says Cox.
“Biology has changed. It used to be about small amounts of data in the lab that gave up information slowly. But that has changed radically within the last 10 or 15 years.
“We now have high throughput sequencing technologies that give us very quick information about DNA. And we need powerful computing to handle all that data. A decade or so ago, we would have spent 90 percent of our time in the lab moving little liquids around in tubes. But with automated (genome) sequencing, we now spend one or two percent of our time in the lab working with samples and almost everything else is done sitting in front of a computer.”
“Processing so much data can be boring and laborious. Azure frees us to do other things to develop our research.”
– Prof. Murray Cox
These Denisovan discoveries were based on statistical models created by Cox and his colleagues and run on Microsoft Azure, which proved a key factor in their project’s success. Massey, unlike many other universities, does not have its own on-premises computer facility to carry out big data-based research.
“These are big, costly computers, but their capacity is limited. Lots of people want to use them, and that means that it is very hard to get the compute time you need when you need it. You have to wait in line,” Cox explains.
In this case, the team went with a Microsoft cloud option. “Azure works well for us. It has scalability and flexibility. It gives us the freedom to work at the pace that we need to get answers.”
Science moves fast
Cox estimates that if the team had instead used an in-house IT facility, their work might have taken an extra six months, year, or more to complete.
“It’s hard to say. But actually, it might have been never. That’s because the amount of computing time we needed would have meant that someone else probably would have got to the answers before we did, and they would have published before us. Science moves fast, and the questions would have been addressed by others if we hadn’t got there at the speed we needed.”
Businesses and bureaucracies routinely use the power of the cloud to sift through, analyze, and harness mountains of big data in fast, secure, and flexible ways. Cox says cloud computing brings the same benefits to the laboratory. “Processing so much data can be boring and laborious,” he says. “Azure frees us to do other things to develop our research.”
New technologies and solutions that operate in the cloud and leverage artificial intelligence and machine learning promise to accelerate the pace of research. And the ability to process lots of data quickly can also open up new directions for scientific inquiry as it did for Cox and his colleagues.
“The origins of modern people are far more diverse and complex than any of us had imagined before.”
Originally, they had set out to study gene variants related to diseases found in the islands of Southeast Asia, and so help in the development of better-targeted treatments for millions of people living there. The search for archaic hominins started only after incoming data pointed the team in that direction.
Now with the Denisovan diversity findings published, the original medical research aim goes on, says Dr. Pradiptajati Kusuma, Lead Researcher at the Eijkman Institute for Molecular Biology in Jakarta, Indonesia.
“The value of the data generated has opened our eyes about how little, to date, we have understood about our diverse populations,” he says. “We have published interesting results on the mysterious Denisovan introgression throughout the archipelago, but that was just a start.”
Kusuma, who is better known as Pai, says the team is now focused on digging deeper into data to find out things like how genetic attributes can affect the impact of therapeutic drugs, or parasite-resistance and the incidence of non-communicable diseases.
Meanwhile, they are continuing to collect blood samples on a voluntary basis, often in remote communities that have little access to regular medical services.
“Every time we do our sampling activities, we also offer health checks,” Pai says. They test for a comprehensive list of indicators and levels, including for blood pressure, body fat, blood glucose, total cholesterol, malaria parasites, and much more.
Cox says the Denisovan diversity findings already paying off, for example, in the cases of some Pacific islander patients who suffer from auto-immune disorders.
“We twigged that the gene that causes these disorders has come from Denisovans. And since we realized that, the medical people have been developing new treatments because they know what the genetic variants are.” he says.
“So when we talk about Neanderthals and Denisovans, it might be easy to think, well, that is all in the past, that is all just history. But actually, the genetic variants that people carry today from these archaic hominins are directly affecting us.
“It is very much living history. Living in our cells.”
TOP IMAGE: A portrait of a woman from Papua New Guinea. Researchers say new knowledge about the DNA types of people in the region will help in the development of targeted medical treatments. Image by Gerrit Bril from Pixabay.