The emergence of AI-powered simulants in improving study efficiency.
In the ever-evolving landscape of artificial intelligence (AI), industry leaders across life sciences are using AI to develop innovative solutions to solve critical problems in healthcare and push beyond the status quo. Across healthcare, AI is being used to analyze medical images and diagnostics, predict patient outcomes, and personalize treatment plans. AI has the potential to innovate and accelerate every aspect of clinical trial development and execution.
Clinical researchers face numerous challenges across site design, data management, patient recruitment, and retention. These challenges can drastically influence the success of developing new therapies. A critical challenge is recruitment and enrollment of diverse patient populations for clinical trials. Despite the growing recognition of the importance of diversity in clinical trials, researchers struggle to include participants from varied racial, ethnic, and socioeconomic backgrounds. This lack of diversity leads to gaps in understanding how different populations respond to treatments, ultimately affecting the generalizability and efficacy of treatments. Logistical barriers, mistrust in the medical system, and limited access to trial sites further exacerbate the problem, making it challenging to achieve representative and inclusive clinical research.
Another key challenge the industry faces is access to historical trial data. Historical clinical trials can provide insights on what has already worked and what has not, playing a major role in accelerating research. However, as with other sensitive data containing patient identifiable information or intellectual property, access to clinical trial data is limited by regulatory requirements, technical protection protocols, proprietary data, and strict privacy requirements required by sponsors—stemming from issues of patient consent and preservation of patient trust. These barriers are significant enough that even with appropriate clinical data sharing commitments, policies, and protocols, the sharing of de-identified patient level clinical trial data remains limited.
One exciting development in clinical trials is AI-powered synthetic data technology, known as Simulants. Simulants is an AI-powered data solution that generates synthetic data from cross-sponsor historical clinical trial data, complete with the covariates, endpoints, clinical and patient characteristics as captured through the clinical trial protocols and electronic case report forms (eCRFs), ensuring a high-fidelity dataset. Simulants leverage established analysis data model standards, providing both traceability and familiarity with users. Simulants tackles key challenges in the clinical trial space: addressing bias within current trial data through the creation of balanced, representative synthetic clinical trial data, all while protecting patient privacy.
Simulants allows companies who don’t have access to data from a particular type of patient or disease to access that data and use it in their studies to build safer and more effective trials. This new approach is transforming how clinical trials are designed and executed, offering significant benefits in terms of risk reduction and efficacy enhancement.
Developed with leading world researchers from Cornell University and University of Illinois Urbana-Champaign, simulants is built on an AI model that has the ability to take clinical trial data as an input and to produce a high-fidelity synthetic version of that dataset where the risk for patient reidentification is minimized. Unlike traditional anonymization, simulants uses a set of AI algorithms to break each patient record into pieces, combine it with similar records, and produce a new dataset that is an amalgamation of the original data. This offers a secure way to access sensitive data by preserving the characteristics of an underlying dataset while safeguarding privacy.
The creation of synthetic data involves transforming patient data from eCRFs from raw input to synthesized output, through several meticulous steps. Initially, data from relevant studies must be located within eCRF databases. Filtering criteria such as indication of interest, mechanism of action, cohort group and size, and variables of interest for a new study are applied to identify the appropriate data. Once filtered, the data undergoes a standardization process. This includes de-identification to protect patient privacy, aggregation of small groups to maintain anonymity, and combining studies from multiple sponsors to ensure sponsor confidentiality. The standardized dataset is then processed through the simulants algorithm to generate a dataset with synthetic patients.
The synthetic data is rigorously compared with the original data to assess fidelity, utility, and privacy. Only when the data meets the required levels of patient and sponsor privacy, and the fidelity and utility are deemed acceptable, is the synthetic data deemed ready for analytics. The rules that govern the simulants AI algorithm are simple to describe, but the bounds of what it’s capable of are not, making it a perfect example of AI’s potential impact on the clinical trials community (see Figure 1).
Simulants data is high fidelity in the sense that it reliably reproduces the results of an analysis as if it was performed in the original data. This makes simulants extremely useful for advancing the scientific understanding of a disease process or the way a drug works and can have a variety of different applications. Additionally, simulants allows researchers to explore more possibilities using historical data, before testing them in a clinical setting.
Synthetic data generated using simulants can be used to predict which patients are likely to experience severe, life-threatening side effects of certain medications, and identify patient subgroups most likely to respond to treatments, ensuring trial outcomes are safer and more effective for patients. AI models can also be developed to help understand systemic differences in how trials run in different settings, making it easier to understand the relative benefit of medicines developed in different geographies and predicting their outcome at a global scale.
Several pharmaceutical and biotech companies have successfully leveraged simulants data to optimize their clinical trials in partnership with Medidata, a Dassault Systemes company. For instance, a leading biotech company used synthetic data generated using simulants to design and execute early-phase CAR-T programs. The insights gained helped them refine their trial protocols and better understand adverse events.
Researchers across the industry have seen first-hand how simulants is paving the way for safer and efficient clinical trials. Synthetic data generation offers a practical and accessible solution for researchers, allowing them to work with high-quality data without the need for extensive data-sharing agreements or complex data handling procedures. This technology enhances the efficiency of the drug development process, enabling more effective trial design, better-targeted patient cohorts with unmet-medical needs, and improved identification of novel endpoints.
In a recent study from a leading EU biotech company, study teams were able to leverage simulants data to confirm safety hypotheses from their clinical leads. The company was pursuing the development of CAR-T programs, but faced challenges due to treatment-emergent adverse events (AEs), complexity of pre-conditioning regimens, dosing, patient selection, among other factors. Using 3,000+ Non-Hodgkin lymphoma, acute lymphocytic leukemia, and solid tumor patients treated with CD19 auto CAR-Ts and bispecifics, a high fidelity synthetic dataset was created for CAR-T trials.
Having access to this cohort of synthetic patients enabled the biotech company to investigate generalizability across hematology-oncology indications, as well as analyze treatment-emergent AEs. This comprehensive analysis resulted in updates to the number of measurements taken and allowed principal investigators to be proactive, rather than reactive, to specific events. Additionally, it facilitated special care for patients receiving concomitant medications if they had already received a drug or if a drug was part of the trial, as it impacted both the frequency and severity of safety events. This approach using simulants not only confirmed safety hypotheses but also led to more accurate and effective trial designs, ultimately enhancing patient care and advancing the company’s position in the competitive CAR-T therapy space.
By leveraging AI technology, simulants can provide high-fidelity synthetic data—reducing trial risks, enhancing trial design, and improving outcomes. This innovative approach addresses critical challenges in clinical research, such as patient recruitment and retention, patient safety, and the sharing of sensitive historical trial data. By creating representative synthetic datasets, simulants ensures that clinical trials are reflective of target patient populations, ultimately leading to more generalizable and effective treatments.
The benefits of simulants extend beyond just improving trial efficiency. By preserving patient privacy and protecting sponsor intellectual property, simulants fosters a more ethical and trustworthy research environment.
As the healthcare industry continues to embrace AI, the integration of simulants promises to bring about significant advancements in clinical research and patient care. This technology can accelerate the drug development process bringing life-altering treatments to market faster. With the increasing adoption of AI technologies such as simulants, the future of clinical development looks brighter—supported by more effective and inclusive trials, with improved health outcomes and safety for patients worldwide.
Mandis Beigi, Senior Director; Afrah Shafquat, Senior Staff Data Scientist; Jia Chen, PhD, Senior Director, Medidata AI; Jacob Aptekar, Vice President; and Yahav Itzkovich, Senior Engagement Manager; all with Medidata Solutions