Biomedical researchers are embracing artificial intelligence to accelerate the implementation of cancer treatments that target patients’ specific genomic profiles, a type of precision medicine that in some cases is more effective than traditional chemotherapy and has fewer side effects.
The potential for this new era of cancer treatment stems from advances in genome sequencing technology that enables researchers to more efficiently discover the specific genomic mutations that drive cancer, and an explosion of research on the development of new drugs that target those mutations.
To harness this potential, researchers at The Jackson Laboratory, an independent, nonprofit biomedical research institution also known as JAX and headquartered in Bar Harbor, Maine, developed a tool to help the global medical and scientific communities stay on top of the continuously growing volume of data generated by advances in genomic research.
The tool, called the Clinical Knowledgebase, or CKB, is a searchable database where subject matter experts store, sort and interpret complex genomic data to improve patient outcomes and share information about clinical trials and treatment options.
The challenge is to find the most relevant cancer-related information from the 4,000 or so biomedical research papers published each day, according to Susan Mockus, the associate director of clinical genomic market development with JAX’s genomic medicine institute in Farmington, Connecticut.
“Because there is so much data and so many complexities, without embracing and incorporating artificial intelligence and machine learning to help in the interpretation of the data, progress will be slow,” she said.
That’s why Mockus and her colleagues at JAX are collaborating with computer scientists working on Microsoft’s Project Hanover who are developing AI technology that enables machines to read complex medical and research documents and highlight the important information they contain.
While this machine reading technology is in the early stages of development, researchers have found they can make progress by narrowing the focus to specific areas such as clinical oncology, explained Peter Lee, corporate vice president of Microsoft Healthcare in Redmond, Washington.
“For something that really matters like cancer treatment where there are thousands of new research papers being published every day, we actually have a shot at having the machine read them all and help a board of cancer specialists answer questions about the latest research,” he said.
Mockus and her colleagues are using Microsoft’s machine reading technology to curate CKB, which stores structured information about genomic mutations that drive cancer, drugs that target cancer genes and the response of patients to those drugs.
One application of this knowledgebase allows oncologists to discover what, if any, matches exist between a patient’s known cancer-related genomic mutations and drugs that target them as they explore and weigh options for treatment, including enrollment in clinical trials for drugs in development.
This information is also useful to translational and clinical researchers, Mockus noted.
The bottleneck is filtering through the more than 4,000 papers published every day in biomedical journals to find the subset of about 200 related to cancer, read them and update CKB with the relevant information on the mutation, drug and patient response.
“What you want is some degree of intelligence incorporated into the system that can go out and not just be efficient, but also be effective and relevant in terms of how it can filter information. That is what Hanover has done,” said Auro Nair, executive vice president of JAX.
The core of Microsoft’s Project Hanover is the capability to comb through the thousands of documents published each day in the biomedical literature and flag and rank all that are potentially relevant to cancer researchers, highlighting, for example, information on gene, mutation, drug and patient response.
Human curators working on CKB are then free to focus on the flagged research papers, validating the accuracy of the highlighted information.
“Our goal is to make the human curators superpowered,” said Hoifung Poon, director of precision health natural language processing with Microsoft’s research organization in Redmond and the lead researcher on Project Hanover.
“With the machine reader, we are able to suggest that this might be a case where a paper is talking about a drug-gene mutation relation that you care about,” Poon explained. “The curator can look at this in context and, in a couple of minutes, say, ‘This is exactly what I want,’ or ‘This is incorrect.’”
To be successful, Poon and his team need to train machine learning models in such a way that they catch all the potentially relevant information – ensure there are no gaps in content – and, at the same time, weed out irrelevant information sufficiently to make the curation process more efficient.
In traditional machine reading tasks such as finding information about celebrities in news stories, researchers tend to focus on relationships contained within a single sentence, such as a celebrity name and a new movie.
Since this type of information is widespread across news stories, researchers can skip instances that are more challenging such as when the name of the celebrity and movie are mentioned in separate paragraphs, or when the relationship involves more than two pieces of information.
“In biomedicine, you can’t do that because your latest finding may only appear in this single paper and if you skip it, it could be life or death for this patient,” explained Poon. “In this case, you have to tackle some of the hard linguistic challenges head on.”
Poon and his team are taking what they call a self-supervision approach to machine learning in which the model automatically annotates training examples from unlabeled text by leveraging prior knowledge in existing databases and ontologies.
For example, a National Cancer Institute initiative manually compiled information from the biomedical literature on how genes regulate each other but was unable to sustain the effort beyond two years. Poon’s team used the compiled knowledge to automatically label documents and train a machine reader to find new instances of gene regulation.
They took the same approach with public datasets on approved cancer drugs and drugs in clinical trials, among other sources.
This connect-the-dots approach creates a machine learned model that “rarely misses anything” and is precise enough “where we can potentially improve the curation efficiency by a lot,” said Poon.
Collaboration with JAX
The collaboration with JAX allows Poon and his team to validate the effectiveness of Microsoft’s machine reading technology while increasing the efficiency of Mockus and her team as they curate CKB.
“Leveraging the machine reader, we can say here is what we are interested in and it will help to triage and actually rank papers for us that have high clinical significance,” Mockus said. “And then a human goes in to really tease apart the data.”
Over time, feedback from the curators will be used to help train the machine reading technology, making the models more precise and, in turn, making the curators more efficient and allowing the scope of CKB to expand.
“We feel really, really good about this relationship,” said Nair. “Particularly from the standpoint of the impact it can have in providing a very powerful tool to clinicians.”
- Learn more about the Clinical Knowledgebase and The Jackson Laboratory
- Learn more about Project Hanover
- Read: How Microsoft computer scientists and researchers are working to ‘solve’ cancer
- Read: Microsoft announces general availability of cloud-based tools for genomics research
John Roach writes about Microsoft research and innovation. Follow him on Twitter.