The potential of next-generation platforms in transforming patient recruitment.
Clinical research is fundamental for advancing medical knowledge, but patient recruitment remains a significant bottleneck. A staggering 80% of clinical trials fail to meet recruitment targets on time, with 15% to 20% of trials never recruiting enough patients to complete the study.1 This recruitment shortfall causes significant delays, with an average Phase III trial experiencing a delay of four to six months. Each day of delay costs sponsors between $600,000 and $8 million in lost revenue.1
Such setbacks extend timelines for bringing new therapies to market, potentially delaying critical treatments for patients.
One major cause of these delays is the reliance on structured data—lab results, coded diagnoses, and other standardized information—which accounts for only 50% to 70% of the relevant clinical trial data.3 Approximately 80% of healthcare data, however, is unstructured, residing in clinical notes, imaging reports, and physician narratives. Traditional recruitment methods, which primarily focus on structured data, overlook valuable insights from unstructured data sources, leading to missed opportunities in identifying eligible patients.4
This untapped unstructured data holds rich insights into patient histories, symptoms, and clinical contexts, which could significantly improve recruitment speed and accuracy if utilized effectively. Federated electronic health record (EHR) systems offered by companies, such as TriNetX and Flatiron, have already shown the promise of real-world data (RWD) working with structured data. Evolving this to incorporate unstructured data into such workflows truly provides huge potential to enhance recruitment efforts.5
Recent advancements in multimodal AI and natural language processing (NLP) enable the integration of unstructured healthcare data into the Observational Medical Outcomes Partnership (OMOP) common data model (CDM), transforming fragmented information into structured, standardized formats. NLP plays a key role by extracting insights from clinical notes and other unstructured sources, enhancing patient profiles and improving recruitment efficiency. An NLP-enhanced OMOP CDM forms the foundation of a data space, where diverse healthcare data is harmonized for cross-border research. This data space—fully aligned with the European Health Data Space (EHDS)—ensures interoperability and compliance and supporting large-scale, secure, and effective clinical research across Europe.
The integration of multimodal generative AI, particularly NLP and automated terminology mapping, is transforming the use of unstructured clinical data in healthcare. Approximately 80% of healthcare data remains unstructured, residing in formats such as physician notes, clinical narratives, and imaging reports.6 Traditional methods, primarily relying on structured data, fail to capture the full range and depth. Multimodal generative AI addresses this gap by processing diverse data types, including text and images, to produce actionable insights, improving patient recruitment and clinical trial outcomes.
NLP plays a critical role in enabling artificial intelligence (AI) to interpret unstructured clinical text, such as detailed physician notes and lab reports, and convert them into structured data. This process is essential for creating more complete patient profiles. By automating the extraction of medical details such as diagnoses and treatments, NLP ensures that previously overlooked patient information is included in recruitment efforts for clinical trials. This capability allows AI systems to structure the necessary information to identify potential trial participants documented in unstructured formats who might be missed by conventional methods.
The OMOP CDM is a widely adopted standard that enables the integration and interoperability of healthcare data for research.7 Mapping unstructured clinical data into the OMOP CDM allows researchers to harmonize disparate data sources into a unified format, enabling efficient analysis in large-scale clinical studies. This standardized framework is essential for ensuring consistency across various institutions and healthcare systems, allowing for seamless comparison and analysis of data from multiple sources.
The process of converting unstructured data into a structured format that adheres to the OMOP CDM enhances the quality and accuracy of clinical data, particularly for patient recruitment in clinical trials. By integrating data from diverse sources, researchers can better identify eligible patients, streamline recruitment processes, and improve overall trial timelines (see Figure 1 below). Standardizing data through the OMOP CDM also improves the reliability of research by ensuring that the data is uniform and comparable across different studies, which is crucial for multinational and cross-institutional research.
The ability to incorporate unstructured data into standardized models such as the OMOP CDM is particularly beneficial in complex therapeutic areas and rare diseases, where comprehensive patient data is needed to support informed decision-making and optimize clinical outcomes. The use of multimodal AI systems, combined with the OMOP CDM, represents a critical advancement in clinical research, enabling more accurate, efficient, and scalable approaches to data analysis and patient recruitment.
A federated network utilizing the OMOP CDM across multiple hospitals significantly accelerates the capacity for multicentric research.
The rise of multimodal AI platforms offers exciting new opportunities in clinical research, particularly for improving the use of RWD. By addressing the limitations of traditional systems that primarily rely on structured data, these platforms enable real-time processing and integration of both structured and unstructured healthcare data, such as clinical notes and imaging reports. This capability enhances the accuracy of clinical insights, making it easier to extract comprehensive patient profiles (see Figure 2).
One of the key features of these platforms is on-demand data activation, focusing on specific therapeutic areas. This targeted approach ensures that data relevant to ongoing clinical trials is processed and made available in real time, allowing research teams to access necessary information precisely when needed. This real-time data processing supports faster decision-making, which is crucial for accelerating clinical trial timelines.
Additionally, multimodal AI platforms foster close collaboration between hospitals and research teams. By ensuring accurate extraction and integration of data, they enhance data quality and enable researchers to capture vital clinical nuances that might otherwise be missed. This collaboration, combined with AI-driven data activation, provides a holistic view of patient data, which improves patient recruitment for clinical trials thanks to the increased amount of high-quality data. These platforms also offer complementary advantages in comparison to traditional federated EHR platforms, which rely primarily on structured data. By incorporating unstructured data, multimodal AI systems substantially improve the speed and accuracy of patient recruitment, especially in complex therapeutic areas like oncology and rare diseases.
Background: In a recent clinical trial focused on hematologic malignancies, specifically multiple myeloma, a major challenge was identifying patients who met the precise inclusion criteria, such as having received at least one prior line of therapy. Standard methods for patient recruitment, relying mostly on structured data, such as diagnosis codes and lab results, were insufficient for identifying all eligible patients. This often led to delays in recruitment, forcing sponsors to consider opening new trial sites to meet enrollment targets.
To address these challenges, IOMED’s data space platform (DSP), previously implemented in that clinical center, was proposed as an alternative. Leveraging multimodal AI and NLP, the platform enabled the inclusion of unstructured data, such as clinical notes, in the recruitment process. This allowed for a deeper analysis of patient records, beyond the limitations of structured data alone, boosting the recruitment process for the trial sponsor.
The integration of the AI-driven platform with the OMOP CDM enhanced the precision of patient identification. NLP technology was able to extract critical insights from clinical notes, capturing nuanced details about patients’ medical histories that were missed by conventional recruitment methods. This OMOP framework allowed for the identification of more than 40 additional patients who met the trial’s inclusion criteria but had previously gone unnoticed.
The ability to incorporate unstructured data sources, such as physician narratives and treatment histories, into a standardized OMOP format resulted in a more comprehensive and accurate picture of potential participants. By improving patient identification efficiency, the platform reduced recruitment timelines and eliminated the need to expand trial sites, ultimately leading to significant cost savings and a more streamlined clinical trial process.
Background: In an observational study on thyroid cancer, conducted across multiple hospitals in Spain, the primary goal was to collect and analyze data to better understand disease management in clinical practice. The study included over 5,000 patients diagnosed between January 2015 and mid-2022. This data provided a comprehensive overview of the real-world management of thyroid cancer, serving as a rich foundation for designing future clinical studies.
The collected data included both structured and unstructured elements. Notably, IOMED’s platform utilized NLP to extract information from more clinical notes that were not available in a structured form, such as genetic mutations or procedures. This approach enabled a deeper understanding of patient characteristics, treatment pathways, and genetic mutation profiles, which would otherwise be difficult to capture using traditional data extraction methods.
Leveraging observational data for protocol design and feasibility assessment: The detailed observational data obtained from the thyroid cancer study played a crucial role in designing the protocol and assessing the feasibility of a future clinical trial. By leveraging structured variables and unstructured insights derived from NLP, the study provided a comprehensive characterization of each patient. This allowed for an in-depth analysis of treatment patterns, including individualized patient responses and the prevalence of key genetic mutations. These insights informed the trial’s feasibility assessment. The observational data allowed researchers to identify relevant patient subgroups and understand the disease management practices across the participating hospitals. This real-world information helped establish criteria for patient selection, refine inclusion and exclusion criteria, and ultimately optimize the clinical trial protocol to align with actual patient populations and their management.
The EHDS is a major initiative launched by the European Union to create a unified digital infrastructure that facilitates the exchange and use of health data across EU member states. The EHDS was established in 2022 to address the fragmentation of healthcare data across Europe and to ensure that health data can be securely and efficiently used for research, policymaking, and patient care. Its primary goal is to unlock the potential of RWD by making it available for secondary use, which includes clinical research, public health surveillance, and evidence-based policy development.
The EHDS brings together three key stakeholders: data holders, data users, and data mediators.
The role of mediators, such as IOMED, is to facilitate the secure exchange of data between data holders and data users, making certain that healthcare data is standardized, compliant, and ready for secondary use under EHDS guidelines.
One of the key contributions of data mediators is improving data quality through integration and normalization across multiple hospital systems. Healthcare data often exists in a variety of formats, ranging from structured data, such as lab results, to unstructured formats, such as physician notes and clinical reports. By using NLP and automated text mining (ATM), IOMED converts unstructured data into standardized formats such as the OMOP CDM. This ensures that data from different sources can be harmonized, creating a cohesive and comprehensive dataset that is easily analyzable and comparable across different institutions.
By integrating and normalizing data, data mediators enhance the interoperability of health data across European hospitals, which are vital for multicentric research projects. This improved data quality not only facilitates data accessibility for research purposes but also increases the accuracy and completeness of patient datasets. Consequently, researchers can derive more meaningful insights, contributing to the generation of robust real-world evidence (RWE).
For policymakers, standardized and high-quality data support evidence-based decision-making, ultimately leading to more effective healthcare policies and better health outcomes across Europe.
The potential of multimodal AI platforms extends far beyond patient recruitment, with opportunities for expansion into various areas of clinical research, such as RWE generation, disease progression monitoring, and treatment outcome analysis. By leveraging advanced AI technologies, these platforms can process structured and unstructured data to unlock insights that traditional systems often miss. In the future, multimodal AI platforms could significantly enhance personalized medicine by enabling more accurate patient stratification and tailored treatment approaches. Moreover, as initiatives such as the EHDS continue to evolve, there is enormous potential for the broader adoption of multimodal AI across Europe. Integration with other healthcare and research networks could further amplify the platform’s impact on healthcare systems and clinical research.
The adoption of multimodal AI technologies could transform the pharmaceutical industry and the broader healthcare sector. By improving patient recruitment efficiency, reducing recruitment timelines, and enhancing data quality, multimodal AI platforms are positioned to streamline the clinical trial process. This translates into significant cost savings and more effective trials, allowing pharmaceutical companies to bring therapies to market faster. Additionally, the ability to create comprehensive patient profiles will support the rise of precision healthcare, where treatments can be tailored more effectively to individual patients based on their unique medical histories and genetic information. The integration of these platforms into the clinical research workflow is expected to improve the scalability and efficiency of personalized medicine initiatives.
AI-powered platforms are reshaping the landscape of clinical research, addressing long-standing challenges in data quality and patient recruitment. By integrating unstructured data from diverse sources—such as clinical notes, lab reports, and imaging records—these platforms enable more accurate and efficient clinical trials. The seamless extraction and structuring of unstructured data ensure that more comprehensive patient profiles are created, improving both recruitment speed and trial outcomes. Advanced technologies such as NLP and ATM play a vital role in transforming fragmented data into standardized and interoperable formats, which can be easily used across research and healthcare institutions.
Data mediators stand out for their contributions to initiatives such as the EHDS, where their technology supports the interoperability and standardization of healthcare data across European countries. This enhances RWE studies and evidence-based policymaking, improving both the efficiency of research and the development of personalized medicine. By enabling deeper collaboration between hospitals and research institutions, these platforms foster a more cohesive healthcare ecosystem where high-quality data drives better outcomes.
As healthcare systems increasingly adopt these AI solutions, their impact on clinical research, precision healthcare, and data-driven policymaking will be profound. Data mediators using AI platforms can bridge gaps in data utilization, streamline clinical trials, and support the development of targeted treatments, ultimately delivering better patient outcomes and more efficient research practices.
Mats Sundgren, PhD, is Senior Industry Science Director, i-HD; Rohit Mistry is CEO, IOMED; and Gabriel Maeztu is Chief Technology Officer and Founder, IOMED
References
1. Smith, Z.P.; DiMasi, J.A.; Getz, K.A. New Estimates on the Cost of a Delay Day in Drug Development. TIRS. 2024. 58 (5), 855-862. https://link.springer.com/article/10.1007/s43441-024-00667-w
2. Huang, G. D.; Bull, J.; Johnson McKee, K.; Mahon, E.; Harper, B.; Robers, J.N. Clinical Trials Recruitment Planning: A Proposed Framework from the Clinical Trials Transformation Initiative. Contemp Clin Trials. 2018. 66 (3), 74-79. https://pubmed.ncbi.nlm.nih.gov/29330082/
3. Sundgren, M. What Impact has Data Science and the Technologies Associated with it had on Pharmaceutical R&D? RxDataNews. 2019. 1 (4), 16-18. https://www.researchgate.net/publication/334895479_Monthly_Deep_Focus_What_impact_has_data_science_and_the_technologies_associated_with_it_had_on_pharmaceutical_research_development_-_The_case_of_Federated_EHR_researc_platforms_implicactions_to_new_dr
4. Sedlakova, J.; Daniore, P.; Horn Winstsch, A.; et al. Challenges and Best Practices for Digital Unstructured Data Enrichment in Health Research: A Systematic Narrative Review. 2023. PLOS Digital Health, 2 (10). https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000347
5. Zhang, D.; Yin, C.; Zeng, J; Yian, X.; Zhang, P. Combining Structured and Unstructured Data for Predictive Models: A Deep Learning Approach. 2020. BMC Med Inform Decision Making. 20 (280). https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-020-01297-6
6. Martin-Sanchez, F.; Verspoor, K. Big Data in Medicine is Driving Big Changes. Yearb Med Inform. 2014. 23 (01), 14–20. https://pubmed.ncbi.nlm.nih.gov/25123716/
7. Wang, L.; Wen, A.; Fu, S.; et al. Adoption of the OMOP CDM for Cancer Research using Real-world Data: Current Status and Opportunities. medRxiv. August 2023, 2024. https://www.medrxiv.org/content/10.1101/2024.08.23.24311950v1