Natural language processing can help simplify protocols of the past, as evidenced by a recent Novartis initiative.
One approach to developing better protocol designs for clinical trials has recently emerged, and it begins by looking backwards. Many pharmaceutical companies have more than 20 years of clinical trial protocols and these can be re-examined for lessons that can inform the present and future of drug development. The challenge until now has been evaluating these historical protocol records which, in many cases, is stored across fragmented information systems in mixed-format documents that would require significant resources to evaluate.
Yesterday’s clinical trial protocols contain the landscape of historic study design that can still be relevant to today’s clinical challenges. Access to key trial design features from legacy protocols includes inclusion and exclusion criteria, schedule assessment tables, lab tests, biomarkers and more, for particular diseases or patient cohorts.
This type of information could be immensely helpful to pharma companies planning new trials, but it has taken advances in artificial intelligence-based (AI) technologies such as natural language processing (NLP) to make it digestible and usable. NLP automates text evaluation, effectively enabling computers to understand context, recognize entities and normalize information to transform unstructured data to structured data that can be analyzed by machine learning algorithms.
NLP can display the results of the process to researchers in digitized documents that are easy to search effectively, and offer the ability to compile and compress into specific data sources to obtain greater detail. Trial sponsors can then leverage this historical information to inform today’s operations, answering a plethora of questions such as:
In 2018, Novartis launched its “medical moonshot,” an ambitious initiative that amounted to the massive digitization of 20-plus years of historical protocols and two million patient-years of data. The project, named data42, was designed to employ AI technologies such as NLP to sift through these mountains of data to surface previously unknown correlations between drugs and diseases.
The project required Novartis to virtually combine all its historical clinical trial protocol records and data sets to enable scientists to ask questions and perform specific inquiries into disease areas that they previously lacked the capability to execute. Novartis scientists established a key objective for clinical trial protocol digitalization, within the broader data42 project: To release key information locked in study protocols to aid secondary analysis and new hypothesis generation. In order to do this, they used OCR and NLP technologies to establish a shared set of digitized, structured files that researchers could rapidly search with meta information and tags.
As a result of the initiative, Novartis has created a connected data environment with decades of historical data that are centralized, harmonized and accessible, enabling researchers to interrogate data in novel ways that yield fresh drug-development insights, and analyze historical protocol data to discover insights that lead to more targeted, efficient, and ultimately successful trials.
The next step for companies like Novartis—and the pharmaceutical industry, in general—is to wade further into digitization, which may eventually result in the ability to completely design and discover drugs based purely on data. This future, in which drug development occurs on a computer rather than in a lab, is likely to hold tremendous cost savings and efficiency gains for clinical trial sponsors. To drive the success of such initiatives, organizations need AI-based technologies like NLP to unlock the power of their existing data.
Jane Z. Reed, Director, Life Sciences, Linguamatics, an IQVIA company
*Note, other useful resources for Novartis digitalization initiatives:
Moving Towards Decentralized Elements: Q&A with Scott Palmese, Worldwide Clinical Trials
December 6th 2024Palmese, executive director, site relationships and DCT solutions, discusses the practice of incorporating decentralized elements in a study rather than planning a decentralized trial from the start.