A doctor, a researcher and an activist: Perspectives on how sharing data is advancing health care

As Dr. Lance Baldo watched Covid-19 engulf U.S. hospitals in March last year, he knew his team could help.  

Compiling and combing through data on the immune system was already at the heart of the work undertaken at the Seattle-based health startup Adaptive Biotechnologies, where he works. Applying these capabilities to the pandemic could offer vital insights on the long-term consequences of the virus, as well as support the development of new vaccines. 

But to do this, the team members needed open access to data from as many Covid-19 patients as possible. They put out a call to collaborators to share blood samples, hoping to collect thousands of samples from around the world. They also stood up a clinical trial in weeks with the goal of enrolling 1,000 participants. Close to 7,000 samples were submitted from around the world, each one analyzed by a team of computational biologists and machine learning experts.  

“Data science never mattered so much as it does now,” Baldo says. “It’s pretty amazing the way that we’ve seen the confluence of biology, biochemistry and data science kind of all coming together, really, arguably, with equal importance in terms of how we’re going to innovate in the future.” 

The result was a diagnostic test that determines previous exposure to the virus and that has now been authorized by the U.S. Food and Drug Administration under the Emergency Use Authorization program. Its rapid development underscores how combining biological knowledge, data analysis and machine learning is paving the way for the future of medicine. 

Lab work at Adaptive Biotechnologies
Nearly 7,000 blood samples from around the world were submitted to support Covid-19 research at Adaptive Biotechnologies

Adaptive’s breakthrough is just one demonstration of the value of opening, sharing and collaborating around data. Unlocking the power of data is also helping Ed Rapp, 64, make sense of the rare neurological disease that may cut short his life and giving fresh insights to Dr. Jinghui Zhang, who studies childhood cancers at St. Jude Children’s Research Hospital in Memphis. 

Over the last 15 months, the world has benefited from the sharing of data in the fight against Covid-19. These three perspectives are from people who advocate and benefit from open data in their daily work.  

Data science helping people through research illustration

Data science helping people through research

When Ed Rapp sought medical help after stumbling on his daily run, the last thing he expected was a terminal diagnosis.  

At 58, and then an executive at a construction equipment manufacturer, he was busy and active. So he was stunned when he was told that he had amyotrophic lateral sclerosis (ALS) – the degenerative motor neuron disease often associated with renowned physicist Stephen Hawking.  

Rapp was told he probably had between two and five years to live. That was in 2015, and his condition has now progressed to the point where he relies on crutches. After his diagnosis, he started looking for answers beyond common medical knowledge.

His quest led him to Answer ALS, a research program operated and coordinated by the Robert Packard Center for ALS Research at Johns Hopkins in Baltimore that uses data insights to learn more about the condition and to develop new treatments. 

Rapp contributed his own data to the project and later joined Answer ALS as a board member. He is convinced that having multiple perspectives on ALS is more likely to lead to new interventions:

In my professional career, he says, I always found that great innovation typically comes from great collaboration.

In January 2021, on what would have been Hawking’s 79th birthday, Answer ALS launched The Data is Here – a new data portal offering scientists unprecedented access to the clinical, genetic, molecular and biochemical data of more than 1,000 patients. It has already released 2.5 trillion data points.  

The hope is that combining tremendous computational power and shared data, with the brain power of many different scientists and medical specialists will discover interventions that could one day stop ALS in its tracks. 

“This is at the forefront of medical research,” Emily Baxi, Answer ALS Program Director, says. “We want everybody and anyone who thinks that they may have the right skill set to really tackle this problem.” 

Jennifer Yokoyama, Microsoft’s Chief IP Counsel, agrees with this approach. “If you have the same people looking at the same sets of data, they’re coming at it with the same point of view,” she points out. “The more eyes there are on it, I think there’s just more possibilities.” 

Yokoyama leads Microsoft’s Open Data Campaign – launched in April 2020 to facilitate greater access to data for better decision-making and to tackle some of the world’s most pressing problems. Finding new ways to share existing data is a key part of the campaign.  

Collecting data to benefit children’s health illustration

Collecting data to benefit children’s health

The benefits of sharing data are also evident at St. Jude Children’s Research Hospital in Memphis, Tennessee, an AI for Health grantee. In 2018, it was there that Dr. Jinghui Zhang’s team launched the St. Jude Cloud. This platform shares genomic data from thousands of young cancer patients in a way that enables biologists without significant data science experience to analyze their own data alongside it, using digital visualization tools

“It will really lower the barriers for people who do not have computational knowledge,” Dr. Zhang, who chairs St. Jude’s Department of Computational Biology, says. “They can become a direct consumer of the data without having to write scripts or write code or run code to do this.” 

St. Jude Cloud
Using data visualization tools, St. Jude Cloud allows researchers to access genomic data from thousands of young cancer patients

Accessing collected data is particularly helpful for unusual conditions. Childhood cancer represents less than 5% of all cancers, according to the World Health Organization

 “It is a very rare disease, and if you do not have access to what’s already known, you really cannot interpret whether what you find is significant or not significant,” Dr. Zhang says. 

The platform now attracts 10,000 unique users a month. Dr. Zhang calls it a “treasure trove” for cancer specialists around the world looking for patterns that could help further understanding and advance treatments. 

If you don’t share the data, you can never get a full picture, because one individual, laboratory or institution just cannot have all the resources to generate sufficient information to make discoveries, she says. 

Crossing borders through technology

The St. Jude Cloud has already enabled researchers from the U.S., Germany and France to classify 135 subtypes of childhood cancer based on gene expression. It has also helped in the study of the mutation rates of 35 subtypes, as documented in a paper for Cancer Discovery.  

More recently, working with colleagues in Shanghai, Dr. Zhang’s team used visualization tools to discover a recurring pattern in patients who relapsed. In a study published in Nature Cancer, the team identified a particular drug that contributed to active mutation in those patients.  

Dr. Zhang also has a deeply personal experience of how her work reaches far beyond the world of pediatric cancer. Her relative in China, she explained, had acute myeloid leukemia – a rare type of blood cancer.  

“We sent her DNA and RNA samples for sequencing to a company in China,” Dr. Zhang says. “I found out the company actually is using our tools to interpret the variant that they found in her tumor sample.” 

The gene fusion that this sequencing unearthed meant that Dr. Zhang’s relative was identified as being in a high-risk group, and her treatment plan was subsequently altered. Her survival is in part due to the analysis conducted using the St. Jude tools. 

It’s an extraordinary testament to the power of global data access. But privacy and sovereignty concerns mean data is oftentimes not passed between countries or between organizations.  

There is often reluctance to make data wholly accessible, says Microsoft’s Yokoyama. “That’s why we talk about making data as open as possible, recognizing that there are some data sets that will not be amenable, nor should they be, to being open or shared. But there are others that absolutely can be more open than they are now.”  

As the Open Data Campaign moves into its second year, Microsoft will focus on advancing scalable tools and governance frameworks to help make data sharing easier, as well as supporting data sharing in low-income regions.  

Protecting privacy and fostering trust illustration

Protecting privacy and fostering trust

While data sharing enables scientific discoveries that will ultimately help save lives, it also raises questions about best practice, trust, stewardship and access. 

Without patient data, initiatives like Answer ALS would never get off the ground. But given its sensitive nature, guarding patient privacy and building trust is critical to open data’s long-term success. 

In the U.S., the use of all medical data is subject to guidelines set by HIPAA, or the Health Insurance Portability and Accountability Act. Patients have the right to limit the distribution of any personal health care information. The use of any European data is covered by the General Data Protection Regulation (GDPR), which also gives individuals the right to restrict processing of their data.   

On Answer ALS’s site The Data is Here, the released data is de-identified. As an additional safeguard, researchers accessing the data are required to sign a data use agreement stating that they will not attempt to re-identify any of the patients whose samples have been shared.    

Working with international collaborators, the Adaptive Biotechnologies team was also careful to ensure that, despite the accessibility of the Covid-19 immune code database, privacy and personal health information were protected. 

“You can’t get down to the level of an individual patient,” Baldo says. “And even if you could get down to the level of an individual patient, you couldn’t identify them anyway, because it’s all de-identified in this massive database.”  

A tipping point for more open data?

The possibilities of data have been understood for some time. But, in the past year, it has united the scientific community in ways that are enabling progress based on flexibility, trust, and a shared ambition for improving health outcomes.  

Baldo sees Covid-19 as a tipping point when it comes to a more open exchange of data.

We’ve seen companies come together and collaborate unlike they’ve ever done before, he says. It’s been a real bright light in an otherwise dark time for our society and for our world.

Whether it’s fully open or an agreement between a group of partners, all data sharing is valuable. Baldo believes that the future of data collaboration in health care will likely land somewhere in the middle ground, with a balance between giving organizations what they need to be successful and ensuring what’s shared also benefits scientific innovation and society. 

Dr. Zhang also envisions a team science model, in which everyone contributing plays to their own strengths. And she urges people not to hold data back because of competition fears.  

“The pie is big enough for everyone to get a slice,” she says. 

The future is promising for sharing

The work that St. Jude, Adaptive Biotechnologies and Answer ALS are doing is paving the way for the future of medicine. It’s a future in which patterns and trends will be accessed and analyzed as a matter of course, allowing for more targeted health care interventions.  

The multidisciplinary way that these organizations are using data has already shown itself to be incredibly valuable. And the aggregation and dissemination techniques they’re applying also hold promise in helping to solve other societal challenges, like climate change 

Finding new opportunities to share data in a secure way across organizational boundaries to realize even greater outcomes is the next stretch on the open data road.  

When it comes to global health and advancing research, says Yokoyama, “I think the sky’s the limit for data sharing, frankly. I think the real key there is doing it responsibly and privately and securely.” 

Like Baldo, she believes that Covid-19 has laid the groundwork for secure data collaboration on a scale that allows scientists to identify and respond to global patterns. 

“The only way to do that is to study the data,” she says. “That’s where the facts lie.” 

“When I think about our journey, I think about us being on a set of dominoes,” Rapp observes. 

The first of those dominoes to fall was seeing exponential gains in computing power and data storage capabilities. Another was the cost of genome sequencing dropping from $1 million to $1,000. 

“My hope,” he says, “is we just continue to knock these dominoes down.” 

Photo credits: Rich Riggins for Answer ALS, Adaptive Biotechnologies, St. Jude Children’s Research Hospital