The Transformative Power of Data Analytics in Clinical Trials

Feature
Article
Applied Clinical TrialsApplied Clinical Trials-02-01-2025
Volume 34
Issue 1

Leveraging and benchmarking insights to boost efficiency and optimization.

Melissa Hutchens, Vice President, Research & Benchmarking, WCG

Melissa Hutchens, Vice President, Research & Benchmarking, WCG


As clinical trial complexity, duration, and cost continue to rise, life sciences companies are turning to data to help them make better decisions and extract efficiency. Managing extensive amounts of data, both internal and external, is a challenge encountered by the industry, including sponsors and contract research organizations (CROs).

There is also pressure to leverage artificial intelligence (AI) and machine learning (ML), which requires a complex data integration and standardization plan, while being cautious of outputs and interpretation.

Biopharmaceutical companies who specialize in benchmarking now strive to create clear performance indicators to provide clients a roadmap: a way to prioritize what to measure, when, how, and why.

Although dashboards offer comprehensive access to all metrics and levels, it is essential to identify and monitor key performance indicators that critically reflect departmental performance. This approach enables leadership to effectively track performance without being overwhelmed by an excess of metrics.

Beyond leadership-defined metrics, the different levels within an organization result in a complex web of metric tracking. The organizational structure within companies can significantly impact the flow of data and access to information within the company. Among large pharma companies, some have a centralized benchmarking and analytics team, whereas others operate with a more decentralized structure. A significant advantage of a centralized approach is that communication regarding data and metrics access flows more efficiently throughout the organization. Often, a benchmarking team can educate a group or department on reports and metrics that are already tracked within the company but are not visible to them due to organizational barriers.

Analytics levels

R&D

The Pharmaceutical Benchmarking Forum, a consortium of major pharma companies committed to understanding industry performance metrics, places a strong emphasis on developing reliable benchmarks through rigorous data assessments, supported by high-quality data standards and defined methodologies. Since its 1997 inception, this forum has consistently advanced R&D performance, evolving its scope in response to industry changes. Recent initiatives and topics within the group include new modalities, complex development paradigms, geographic expansion, and regulatory impacts.

Companies are keen to understand their position at the portfolio level, and metrics assessed are drastically different from the clinical operational level. Although numerous metrics and data analyses are available, they emphasize four key measures for thorough monitoring and in-depth examination: success, cycle time, productivity (spending per approval), and value creation (sales per investment). These variables are evaluated independently and concurrently, as sponsors may strategically make decisions that may negatively impact one of these measures for a gain in another.

Success rates are among the most difficult R&D metrics to obtain from external data, and almost impossible to source reliably from public sources. Defining success and failure is just the start; defining projects and ensuring comparability across sources is even more challenging, as there should be precise definitions of what constitutes a project. Key elements include the intent to file, regional considerations, timing of line extensions, and deviations from the lead indication.

The most recent publicly released Pharmaceutical Benchmarking Forum new molecular entity (NME) development success rate is 6.1% (2019-2023), which continues a decline from previous years (see Figure 1 below). Both small and large molecules have shown a decrease in success rates. However, recent developments in modalities such as cell and gene therapy and the use of antibody-drug conjugates and multi-specific antibody molecules have resulted in improvements in certain areas, though not enough to boost the overall rate.

Modeling success rates to understand the variation and trends is a complex process. Statistical models such as logistic regressions can effectively explain the factors contributing to success rates with excellent interpretability. However, the actual factors and their functional relationships with success are more complex than these model assumptions allow. To make more accurate predictions regarding success rates, advanced modeling techniques through deep learning models can be introduced. With a robust, structured dataset, deep learning models can utilize granular data to identify intricate patterns that consistently lead to success. However, these models employ sophisticated logic that is not easily summarized or interpreted when it comes to the impact of each factor, requiring a bridge between business stakeholders and technical experts to gain insights from the results.

Sponsors are continuously pressured to accelerate the product development lifecycle. R&D cycle times have been progressively lengthening, reaching a high of more than 15 years from start of discovery to approval (see Figure 2 below). The interval between phases (white space) has received considerable attention, as companies aim to shorten the time between the end of one phase (database lock) and the start of the next phase (first participant enrolled). Companies are seeking ways to understand and improve their efficiency for this white space transition, which represents a considerable portion of development durations. Process improvement efforts traditionally look at clinical aspects such as faster site selection, activation and recruitment, and starting some of these activities at risk.

Value creation refers to a company’s ability to generate output value (sales) based on the input (investment). There are various methods to measure this, and not all sales are of equal significance. A key factor is the sales of new products approved compared to those approaching the end of their patent life, which may face generic or biosimilar competition. The Pharmaceutical Benchmarking Forum defines new products as those approved in the last five years. This helps to understand a company’s “freshness index” of value creation. A low index indicates a potentially critical position for the company.

Clinical development

As we advance to a deeper level within the organization, our attention shifts toward clinical trials. The timeline for studies commences with the drafting of the protocol and concludes with the final clinical trial report. This total trial duration has increased by 47% in Phase III over the last 20 years, 6% in the last three years. Some of the factors that contribute to longer trial cycle times include complex trial designs (see Figure 3 below), regulatory requirements, niche patient populations (e.g., rare diseases, biomarker specificity), and geographic footprint.

For clinical programs, the focus is on five specific cycle times within the trial space to assess a company’s standing with respect to their peers: study start-up, enrollment, data capture, statistical analysis, and reporting. Utilizing statistical modeling and ML, the key variables influencing these outcomes are identified, which may include factors such as portfolio complexity, study size, geographic coverage, and design methodologies.

Study start-up

Selecting the metrics of focus beyond the high-level cycle time is an educational journey. While many companies celebrate the first participant enrolled milestone, optimizing enrollment rates requires a well-coordinated series of processes. A top-down approach would include the first participant enrolled globally, first participant enrolled in each country (with specific attention to the last country), and the first participant enrolled at each site. With this information, companies can evaluate the time required to initiate all countries and the time needed to initiate all sites (see Figure 4 below). The site initiation rate is not mentioned as often as a participant enrollment rate but is a critical driver of trial duration. In fact, in many predictive models, the driving force is the cadence of site initiation.

Site performance

A centralized and clear definition of site performance is valuable to sponsors. One challenge in achieving effective site performance is the standardization of site names and investigators, both within an organization and across the industry globally.

The performance metrics tied to this data are only available through proprietary datasets; public data does not offer the performance indicators critical to site selection, such as contracting times, activation times, subjects achieved, enrollment rates, non-enroller instances, and dropout rates. ML models are useful to determine site scores for a given study based on constrained optimization methods designed using study criteria. Transparency in scoring methods helps users feel confident in decisions, and access to underlying metrics enables them to verify the outputs.

Industry regulatory changes

Even with established frameworks and metrics dashboards, companies must react continuously to industry and regulatory changes. This includes frequent assessments as to new data fields to track, adjustments to metric calculations, and even sunsetting metrics previously evaluated, something that is extremely difficult to achieve. For example, as the EU moves to a central EU clinical trial regulation, looking at individual country clinical trial applications in this region will not be helpful. Preliminary analysis of this new process suggests that there will be no immediate improvement in cycle time, though time will tell if there will be gains in the future. Companies are now debating strategies to prevent longer cycle times, such as carefully evaluating which country to start with first. Other mandates, such as single institution review boards, and the release of ICH E6 R3 should also be evaluated with respect to data tracking and expectations.

The FDA’s draft guidance for diversity action plans has prompted a swift response from the industry. Companies are compiling data through various sources such as census, electronic health records, claims, historical enrollment, and disease incidence to fuel their efforts to create the best plans possible. The highest quality historical enrollment and demographic data sits within each sponsor and CRO company. There is a collaboration across the industry to create a central database that is blinded and anonymized and can help each company leverage insights to optimize clinical trial execution and diversity data plans.

Non-negotiable requirements

Compiling data effectively to generate a meaningful set of metrics across various levels is a significant endeavor. There are several non-negotiable points that are essential to their success:

Data quality. It is essential to have high confidence in your data sources. Internal data should also be scrutinized and cleaned. Data submitted by clients often arrives in raw form, requiring extensive error checks and queries. This process includes filling gaps and extracting key variables with close collaboration between a vendor and client.

Data framework. Define a clear data framework with intuitive naming conventions following domain-driven design. It is critical to employ the business’ subject matter experts that can guide data definitions and sources. This framework allows companies to speak the same language and create comparable metric assessments.

Integrating data sources. This is one of the most challenging aspects of working with a variety of data sources and vendors. There are numerous fields that must be standardized and mastered according to a defined structure, such as disease, phase, countries, site, and principal investigator names (see Figure 5 below). Although each company possesses its own internal data source for data analytics, the significance of integrating multiple data sources cannot be overstated. Industry collaboration is crucial not only to increase the volume of data for analysis but also to align on key metrics methodology.

Communication. Clear communication between business and technology teams is crucial for success. These teams must collaborate to create a data story that aligns with the organization’s strategic objectives and extracts insights effectively.

AI & ML

AI and ML are reshaping the clinical development landscape, offering transformative potential to improve operational efficiency, refine clinical trial processes, and drive innovation. From drug discovery to development, these technologies are making strides in areas such as success rate modeling, protocol optimization, site selection, and diversity, equity, and inclusion analytics. However, fully leveraging AI and ML requires addressing challenges in data quality, organizational collaboration, and strategic alignment.

High-quality, structured, and accessible data is essential for reliable ML models. Companies must address issues such as data governance, literacy, and security while fostering collaboration between business and technology teams. The expertise of business professionals is particularly critical to guide model design, select input variables, and interpret results, ensuring AI applications meet real-world needs. As organizations strive to leverage the full capabilities of ML, a commitment to high-quality data, collaboration, and continuous improvement will be essential. The future of ML holds immense promise, and through strategic implementation, the technology is poised to drive significant advancements for this industry.

Melissa Hutchens is Vice President, Research & Benchmarking, at WCG



Recent Videos
Related Content
© 2025 MJH Life Sciences

All rights reserved.