Clinical Trial Failure: A Data-Driven Perspective on Risk Mitigation

Feature
Article

An increase in protocol complexity is paving the way for machine learning models to optimize trial design.

© photon_photo - © photon_photo - stock.adobe.com.

Image Credit: © photon_photo - stock.adobe.com

The clinical trial failure dilemma

What constitutes clinical trial failure is a matter of perspective. For sponsors, the loss of time and resources when an invested drug fails to reach the market may make the trial feel like a failure. However, a well-conducted trial that proves a drug ineffective, inferior or unsafe is a success because it has fulfilled its purpose: determining whether the drug should be used in patients.1

Premature termination represents true clinical trial failure. A trial that is initiated but never completed wastes resources and contributes no valuable knowledge. Estimates of premature termination (referred to as “trial failure” hereafter) range anywhere from 3% to a staggering 46%, depending on the therapeutic area and definitions used.2 Understanding the factors driving trial failure is important for its prevention.

Trial failure risk factors

Clinical trial registries such as ClinicalTrials.gov contain data on hundreds of thousands of trials. Scientists have applied frequentist statistics (such as multivariable logistic regression) to these datasets to identify factors independently associated with failure. Results vary across studies due to differences in therapeutic areas and timeframes analyzed, but some findings remain consistent.2

Recruitment failure is the most commonly cited reason for trial failure, while small sample size is the most consistent risk factor. Numerous strategies have been proposed to overcome this challenge: performing open-label trials, including telephone reminders for non-responders, conducting additional researcher training, and delegating the enrollment efforts to nurses instead of physicians.3,4 However, recruitment failure may not be the root cause, but rather a consequence of other issues such as overly restrictive eligibility criteria or complex visit schedules. This calls into question whether suggested strategies are optimally targeted, or whether simpler design-phase interventions would be more cost-effective.

Numerous studies have found industry-funded trials more likely to terminate compared to government- or academic-funded trials, although this association isn’t as strikingly consistent as for sample size. Systematic differences in trial design and failure reasons between industry- and non-industry-funded trials may explain these results. For example, a study of phase III cancer trials found that industry-sponsored trials almost exclusively tested systemic drug interventions, while 45% of non-industry-funded trials investigated other interventions.5 The same study showed industry-sponsored trials were more likely to terminate due to futility or toxicity. These reasons for termination reflect ethical obligations to halt unsafe or ineffective treatments and should not be considered true "failure."

Geographic disparities in trial outcomes have also been reported. Analyses relying on ClinicalTrials.gov suggest US-based trials terminate more often than non-US trials, potentially due to higher costs and stricter regulations. However, this finding might also reflect reporting bias, as most US trials are registered on ClinicalTrials.gov, while trials performed in other countries may use one of 16 other major registries (or none at all).

Other factors, such as trial phase, number of participating centers, blinding, randomization, and the presence of a data monitoring committee, have been explored to varying degrees, yielding inconsistent or inconclusive results.

As these examples illustrate, it remains unclear whether these factors are true risk factors or simply markers of insufficiently investigated underlying causes. Frequentist methods have several limitations: they are unable to analyze unstructured protocol design elements, they cannot handle the sheer number of variables potentially influencing failure, and they produce generalized risk assessments that ignore trial-specific design differences. This is where machine learning offers new opportunities.

Comprehensive risk detection using machine learning

The power of machine learning lies in its ability to integrate diverse and high-dimensional data sources, allowing companies like Wemedoo to develop machine learning algorithms analyzing any type and quantity of variables.6

These algorithms can analyze unstructured data derived through natural language processing (e.g. complexity of eligibility criteria in protocol text) in addition to structured data (sample size, funding source, etc.) traditionally analyzed in frequentist methods. Machine learning models can then analyze large numbers of these extracted features (variables) without issues like collinearity or non-linearity, which challenge frequentist statistical models. Additionally, they can identify hidden patterns and associations that would stay invisible under frequentist assumptions.

Once the model has been trained, it can predict whether a trial will fail based on its protocol. Interpretability tools, such as Shapley additive explanations (SHAP), can be used to visualize the main features driving the model’s predictions. For each analyzed trial, SHAP assigns distinct values to each feature depending on the trial’s unique combination of features. The value can either be negative (contributing to failure) or positive (contributing to success). This method avoids the one-size-fits-all simplifications present in frequentist methods (e.g., declaring “industry-sponsored trials are twice as likely to fail”) by accounting for the nuance interactions specific to each protocol.

Finally, the true potential of these tools emerges in trial design optimization. By iteratively simulating adjustments to protocol parameters flagged by SHAP, sponsors can test how changes affect predicted failure. Domain expertise remains critical to validate whether adjustments align with scientific objectives and real-world feasibility. Stakeholders then finally decide whether and which modifications to implement.

An example of a trial accurately predicted to fail is shown in Figure 1. With 2,000 features included in the model, it is evident that most have only a minimal impact on the predicted outcome. The most influential “negative” factors driving the model’s prediction are the trial’s is investigation of an FDA-regulated product and its requirement for regulatory oversight. Since these factors are non-modifiable, stakeholders could consider altering other influential factors (e.g. replacing placebo arms with active comparators, increasing the number of trial sites, or excluding elderly participants). As mentioned, domain expertise is crucial in this step to determine which adjustments are scientifically and ethically justifiable.

Figure 1. SHAP plot for a failed trial accurately predicted to fail as the final value (-0.145) is lower than the baseline value (0.213).

Source: Wemedoo AG

Figure 1. SHAP plot for a failed trial accurately predicted to fail as the final value (-0.145) is lower than the baseline value (0.213).

Source: Wemedoo AG

Remaining challenges

While machine learning could signify a breakthrough in clinical trial management, challenges remain before its full power can be harnessed. Current models are still not highly accurate, though improvements are expected as innovative approaches in modeling are developed. A more significant challenge lies in the nature of failure factors: many identified through interpretability methods are not modifiable, limiting the models’ applicability. Finally, inconsistent or incomplete registry data undermines prediction quality, as machine learning models depend on robust training data.

Machine learning enables a paradigm shift in clinical trial risk mitigation by synthesizing heterogeneous data to predict protocol-specific failure risks. The future of clinical trials will demand even greater multidisciplinary collaboration, integrating engineers into the protocol design phase.

Aleksa Jovanovic, MD, PhD, scientific engagement and innovation specialist, Wemedoo AG

References

  1. Kolstoe SE, Davies H, Messer J. A clinical trial is a success, not a failure, if it does not demonstrate efficacy or does identify safety concerns. Contemp Clin Trials Commun. 2018 Oct 22;12:198. https://pmc.ncbi.nlm.nih.gov/articles/PMC6258826/
  2. Jovanovic A, Gavric S, Dennstädt F, Cihoric N. Methodological Approaches in Analyzing Predictors of Clinical Trial Failure: A Scoping Review and Meta-epidemiological Study. Manuscript in preparation.
  3. Treweek S, Pitkethly M, Cook J, Fraser C, Mitchell E, Sullivan F, et al. Strategies to improve recruitment to randomised trials. Cochrane Database Syst Rev. 2018 Feb 22;2(2):MR000013. https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.MR000013.pub6/full
  4. Donovan JL, Peters TJ, Noble S, Powell P, Gillatt D, Oliver SE, et al; ProtecT Study Group. Who can best recruit to randomized trials? Randomized trial comparing surgeons and nurses recruiting patients to a trial of treatments for localized prostate cancer (the ProtecT study). J Clin Epidemiol. 2003 Jul;56(7):605-9. https://pubmed.ncbi.nlm.nih.gov/12921927
  5. Buergy D, Riedel J, Sarria GR, Ehmann M, Scafa D, Grilli M, et al. Unfinished business: Terminated cancer trials and the relevance of treatment intent, sponsors and intervention types. Int J Cancer. 2020;1-9. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ijc.33342?msockid=3617c9903e4461e60ceadc883fed6098
  6. Cihoric N, Gavric S, Dennstädt F, Jovanovic A. A Multi-Factor Machine Learning Model for Predicting and Preventing Clinical Trial Failures, 04 March 2025, PREPRINT (Version 1) available at Research Square. https://doi.org/10.21203/rs.3.rs-5932404/v1

Recent Videos
© 2025 MJH Life Sciences

All rights reserved.