Explore how natural language processing and social graph techniques help to tackle the challenge of patient and investigator recruitment and raise the success rate of clinical trials.
Only 14% of all clinical trials are successful, state the authors of an article1 published in Biostatistics in 2019. According to comparable studies2, the chances of success for a compound entering trials is even lower. In the end, regardless of whether the truth is closer to 10% or 14%, the risk-reward calculus for clinical trials is dicey and it is hard to find another major business type operating under such a high failure rate.
A crucial challenge that many companies specializing in this area are facing on a regular basis is finding a sufficient number of patients that are willing to participate in clinical trials. This is reflected in a study3 published by Grant D. Huang in Science Direct, which concludes that a staggering 86% of clinical trials do not reach their enrollment targets within a predefined time period. Consequently, researchers obtain no knowledge whether their drug is safe and effective. Qualified patients may not be identified within the narrow trial eligibility window and as a result miss the potential opportunities of trying cutting-edge medications.
There are a number of challenges faced by contract research organizations (CROs) during the doctor and patient recruitment stage, such as:
Structuring the available data, making it easily searchable, and finding connections between principal investigators, patients, and clinical organizations is another clinical trial bottleneck, especially when the study is conducted across different geographical locations. Structured data enables clinical organizations to be proactive and anticipate its needs and optimize the business efficiency. For example, a multi-site investigator might not be able to see all the subjects going through different parts of the clinical trial enrollment process, still, they have to know all the details about when the patient was enrolled and what the observations regarding them are, in case the research team needs information about a specific patient. With the help of data science and NLP, investigators and CROs can easily structure and see all the information regarding patients and their conditions.
In order to be able to quickly invite doctors-influencers, who have written about a certain topic to participate in a corresponding clinical trial, clinical research organizations (CROs) can use NLP techniques to identify the connections between different document authors and find out who is the most influential author on a specific topic. Furthermore, data science and NLP methods help to evaluate which patients fit the specified inclusion criteria.
But what is NLP actually?
NLP is an area of artificial intelligence and computational linguistics. It is focused on using computer power to analyze natural language, namely all text information, in order to identify patterns, names, and other entities. NLP can process speech, clusterize text by topics, extract relationships between objects, classify documents, and much more. With the aid of NLP, the data from disparate sources can be unified, labeled, and structured. Data scientists can integrate EMR, LIS, and lab tests data into one database and process it with the help of NLP, so it delivers more value to clinical organizations.
There are six NLP techniques which are frequently used while working with unstructured medical data:
To maximize the outcomes from everyday work, therewith to get a competitive advantage, many leading life sciences companies have already implemented NLP techniques.
“In the clinical domain, researchers have used NLP systems to identify clinical syndromes and common biomedical concepts from radiology reports, discharge summaries, problem lists, nursing documentation, and medical education documents. Different NLP systems have been developed and utilized to extract events and clinical concepts from text (…). Success stories in applying these tools have been reported widely”, says an article4 that evaluates the applications of clinical information extraction, issued in the Journal of Biomedical Informatics.
A huge amount of medical data is still stored in a non-editable format, such as typewritten medical notes, text on images or printed out documents. Extracting this information often is an important aspect of clinical trials, as the data can provide valuable evidence regarding the drug that is tested.
However, it can be time-consuming and laborious for a doctor to go through and review individual records. Optical character recognition (OCR) techniques help to digitize printed texts, such as PDFs, so they can be electronically editable, searchable and usable for further analysis.
After OCR, text mining NLP techniques can be used to extract specific features or objects from the scanned photocopies. Thanks to advantages over the years, automatic text mining today is not only less bothersome, but also more consistent and reliable, detecting 3%–14% additional feature instances compared to manual checks, according to research5 that assesses text-mining-assisted extraction of pathology features from scanned clinical records and was published in BMJ Open in May 2020.
Overall, OCR and text mining facilitate prompt and accurate abstractions that can be used to speed up clinical research processes.
To speed up the time needed to find appropriate investigators and patients for a specific clinical trial, CROs can fall back on an approach that all kinds of brands are using when marketing their products: the social graph technique. The term refers to a method of data analysis derived from using social networks to find influencers; people engaging with the largest and most relevant audience on social media.
The most famous social graph is the one created by Facebook, connecting its 2.7 billion monthly users.6 For the pharmaceutical industry, a social graph can be built to show the connections between different doctors that conduct research on a specific topic. On the graph CROs and sponsors can easily see what investigators they have already invited to participate in a clinical trial and the ones that have not yet been invited.
This approach is very helpful because influential principal investigators, or Key Opinion Leaders (KOLs) play a vital role in pharma research, development, and the marketing of new products. Fact-based identification and engaging with the right KOLs can influence the quality of partnerships, pharma business objectives, and a medication’s overall life cycle.
One of the most important aspects of a clinical trial is selecting high-functioning investigator sites because they can dramatically affect product approval, study costs, and timelines. Too often, however, the identification system for sites is not very mature. As a consequence, the decision of whether a site is deemed suitable is often simply based on whether the necessary infrastructure and know-how to fulfill the activities specified in the clinical study protocol are available. That is why only one-third of all sites manage to attract enough patients, with many of them falling considerably short or not even enrolling a single participant.
To improve the process of finding clinical trials that perform well, it makes sense to include criteria such as an investigator’s expert status (e.g., how many articles has he published and how often are they quoted) or prior experience in clinical trials with similar treatments. Other critical factors could, for instance, be the site’s location and its previous success rates in enlisting subjects, the proximity of comparable studies, or the epidemiological data of the specific patient population. Information like this can be gathered by combining targeted database population, electronic health records, insurance databases, prescriptions, and so on and using NLP techniques to make sense of the data’s semantic relationships.
A semantic relationship could, for instance, be asking the solution for sites at which an advanced kind of brain surgery is performed. The system can then gather all relevant sites and a site-scoring algorithm can automatically rank them according to parameters such as the frequency of this special operation, the expert-status of the responsible doctor, the overall site experience with this procedure, or former enrollment rates. The value of this approach is the accurate prediction about the site’s match and the huge time savings for researchers, who do not have to do this work manually.
At the present day, far too much data that is contained in medical records, health documents, questionnaires, publications, articles, or other documents and could be used to improve clinical trials, remains untouched. However, when sorted, labeled, cleared, and analyzed, this data can be used to gain trailblazing insights.
NLP techniques that can be applied to unstructured medical data include named-entity recognition and topic clustering. These techniques help to identify the needed entities and automatically segment the texts to the predefined categories, which in turn can mean tremendous time savings for researchers.
The combination of NLP and social graphs helps to raise the success of clinical trials by addressing the fundamental challenge of investigator and patient recruitment. NLP techniques leverage the power of unstructured data to quickly match CROs with resourceful doctors and eligible patients. By utilizing the power of relationships between data items, the investigators that have researched a specific topic can be quickly identified. A similar technique can be used to identify top-tier trial sites. To get the full advantage out of NLP, social graph and impact factor algorithms, clinical organizations can utilize the help of proficient product development outsourcers.
Igor Kruglyak is a Senior Advisor at the global IT service provider Avenga. Michael DePalma is the Founder and President of Pensare, LLC; Co-Founder of Hu-manity.co
‘Hypothesis-Free’: Getting Proactive About Signal Detection
December 5th 2023Elizabeth Smalley, director of product management, data, and analytics at ArisGlobal speaks about her work at the software company in supporting the efforts of life sciences clinical and pharmacovigilance teams in signal detection.
‘Hypothesis-Free’: Getting Proactive About Signal Detection
December 5th 2023Elizabeth Smalley, director of product management, data, and analytics at ArisGlobal speaks about her work at the software company in supporting the efforts of life sciences clinical and pharmacovigilance teams in signal detection.
2 Commerce Drive
Cranbury, NJ 08512