Applied Clinical Trials
A two-phase statistical analysis identifies the key performance drivers in clinical trial startup.
Study startup underperformance and inefficiency has been a problem for executives who manage clinical research trials for dec-ades. Study startup is the most challenging and important stage of any clinical trial. At the same time, startup also has the lowest performance scores and the greatest variation in performance of any of the other stages of clinical trials.1 For clinical study managers, the key to high performing studies is appropriate governance.2,3 That is, they should be able to track performance as the study progresses in order to appropriately manage performance.4 But of the hundreds of activities involved in starting a clinical trial, what are the key indicators that they should watch to know if the study startup is going well? This issue is important because busy clinical study managers worry about their "blind spots" or the innocuous issues that come back to undermine the trial. When a trial is outsourced, governance issues are magnified as clinical trial managers try to assess performance across organizational boundaries.
Contract research organizations (CROs) focus on study startup as well. They want to deliver high levels of performance on their contract, but may be unsure of what to emphasize in the flurry of startup activity. This decision is also made more complex by a myriad of clinical, financial, logistical, and commercial concerns. CROs need objective evidence of the key drivers of study startup in order to meet sponsor expectations and make rational service quality investments.
The central thesis of this article is that, of all the variables involved in a study startup, it is possible to identify the key drivers that have a disproportionate impact on study startup performance. The goal is to improve the ability to enhance the effectiveness and efficiency of trial monitoring and governance, and give clinical study managers a tool by which they can assess and manage clinical trial performance.
In order to identify the key drivers of study startup, we took a two-phase research approach. In the first phase, we sought to identify all of the important drivers of study startup performance by asking a broad variety of experienced clinical trial managers to identify the performance drivers. This approach reduces subjectivity bias because the identified drivers will not be our opinion, but that of a broad group of experts. In the second phase, we compared all of these drivers in a statistical model to see which of the drivers from phase one had the most substantial and significant impact on study startup performance. This research is part of the Clinical Trials Outsourcing Performance (C-TOP) study, an ongoing collaboration between Drexel University and CRO Analytics examining all of the phases of clinical trial performance. The research was conducted under Drexel University institutional review board (IRB) approval.
For the first research phase, we interviewed three dozen pharmaceutical and CRO executives between October 2011 and July 2012 who had substantial experience in managing clinical trials. The goal of phase one was to identify a broad perspective on all of the potentially important drivers of study startup. We wanted a variety of views in order to reduce potential bias from tapping into a single perspective of clinical trials measurement. We continued to conduct interviews until we felt we were not gaining any new insights. Both authors were present for each interview and each took notes to record the responses. Each respondent was first asked to describe what they considered to be the important drivers of startup performance. We encouraged respondents to list as many drivers as they thought affected performance to obtain the most comprehensive list possible. In order to associate each factor with performance, we asked them why each factor was important and what actions by the study team were associated with high performance in that factor.
The raw data that resulted from the interviews was a two-column list for each interview—the drivers on the left and the actions that led to successful performance of that factor on the right. In the interviews, we found that respondents began to spontaneously create associations (either a correlation or time-dependency) between the factors. For data management, we continued this process of synthesizing and re-ordering the data by associating factors with each other and then linking them to their performance activities. The ultimate result of this process was a list of performance drivers along with the activities associated with each driver.
Adhering to timelines was the performance driver that most of our respondents mentioned early in the interview. The timelines fell into two distinct areas: recruiting and operational timelines. Adhering to recruiting timelines refers to the ability to bring investigators into the clinical trial in the planned amount of time. (Note: There was a mix of opinions, but most of the respondents felt that recruiting investigators was a function of study startup and recruiting patients belonged with the study conduct stage of clinical trials. We accept that division and explore the relationship of investigator and patient recruitment as it relates to performance separately).
The operational timelines assess the ability of the clinical study team to meet the planned milestones in a timely fashion. Examples of operational timelines in study startup include activities such as the creation of the project and operations plans, study-specific convention (i.e., the definitions for terms in the study, such as, for instance, what "high cholesterol" means), forms and documents (e.g., case report forms, informed consent forms, or patient enrollment forms). Study startup operational timelines also includes activities such as IT system setup and submitting the appropriate documents to regulatory agencies (IRB forms, FDA forms, or http://clinicaltrials.gov/ registration). We recognize that the recruiting timelines are a subset of the operational timelines, but our subjects thought that the recruiting timelines were important enough to justify being assessed separately.
The respondents also felt that the ability to identify qualified investigators was also related to recruiting timelines, but was important enough to be considered separately. Since so much of the success of a clinical trial depends on high-performing investigators, our respondents thought it was important to have an ability to recruit and then engage investigators that would not only be clinically qualified, but also have a stable of eligible patients and the organizational skills to conduct a clinical trial.
Many of the respondents noted that assessing performance on timelines is very contextual. It is important, in their view, to capture expectations for a trial within the assessment. Simply measuring the number of days, for example, that it takes to complete an activity does not capture the context. Consider a trial in which it takes 75 days to recruit subjects. Is this good performance? The answer, of course, is that it depends on the study. Respondents felt that performance assessment metrics should be constructed in a way that accounts for the expectations associated with a particular trial.
Maintaining consistent study team personnel was an important factor in the respondents' view. Within the context of study startup, this meant that the clinical study team members that began the trial were the same as what was proposed in the planning stages of the study. Having different people show up at the start of the trial often meant that there would be skills gaps and instability in the team.
Investigator meetings were felt by our respondents to be an important milestone in study startup. These meetings were an opportunity to properly orient the clinicians and their staff to the trial and are a key factor in making sure the study was conducted according to the specifications.
Finally, those interviewed identified two key people that drove study startup performance: the project managers and clinical research associates (CRAs). For project managers, it is important that they be (1) knowledgeable about how to conduct clinical trials, the specific details of the client's trial, and about good clinical practice (GCP) and regulatory issues; and (2) have good interpersonal skills in the sense that provided timely and effective communication, collaborate well, are proactive problem solvers, and have the ability to recommend effective solutions.
CRAs work closely with the clinical sites to assure that they adhere to the study protocol, help administrative functions flow smoothly, and to assure the validity of the trial—although aspects of CRA performance vary by the type of study. In addition, there is a paradoxical aspect to CRA performance. High-functioning CRAs are adept at integrating themselves into the clinical environment and avoiding disruption. As a result, we used a global assessment of CRA performance. Figure 1 illustrates the relationships between the drivers of study startup performance that resulted from phase one.
At this point, we organized our insights into a series of questions to assess how the clinical study team performed. Where possible, we tried to organize the question in the order that they were performed over the course of study startup. We were careful to construct the items in a way that was consistent with our respondent's emphasis and depth of detail. The questions were organized into a survey instrument and edited for clarity. We then asked 10 executives to review the instrument for coverage of important drivers, the level of detail of the questions, and clarity. We then posted the survey online and again reviewed the items for clarity and ease of online presentation.
The purpose of the second phase was to compare all of the drivers in a statistical model to identify which had the most significant and substantial effects on study startup performance. We solicited subjects via email in a purposive sampling using an online survey. Table 1 shows how each of the constructs were measured. We received 65 completed instruments from respondents with an average of 16.5 years of experience in the pharmaceutical industry and 9.5 years of experience overseeing CRO contracts. The trials that the subjects evaluated were from a variety of phases (Phase II, 22%; Phase III, 35%; Phase IV, 43%), specialties (neuroscience, respiratory, oncology, cardiovascular, diabetes mellitus, and infectious disease) and regions (North America, 80%, Europe, 57%, Asia, 26%, Russia, 28%, and South/Central America, 30%. The overall average of study startup performance was 5.6 (on a scale of 1–10) and was lower compared to sales and contracting (6.2), study conduct (6.0), and study closeout stages (6.6).
The statistical approach that was used to test the model in Figure 1 is known as partial least squares (PLS) path analysis. PLS provides regression coefficients to estimate the relationships between each variable and study startup performance at the average of all the other factors. The coefficient estimate, for example, for adhering to operational timelines is .45. This is interpreted as "a 1-unit (or 10%) increase in adhering to operational timelines gives a .45 unit (or about 4.5%) increase in study startup performance at average levels of project manager performance, investigator recruiting, etc."
Several statistical indicators suggest that the model performed well. A substantial number of the paths were significant and the model explained high levels of variance in the important variables (project manager R2 = 81%; operational timelines R2 = 71%; study startup performance R2 = 73%). The two project-manager variables (i.e., knowledge and interpersonal skills) requiring detailed indicators demonstrated discriminant and convergent validity. It was originally thought that the investigator-related variables (ability to identify qualified investigators, timeline for recruiting investigators, and investigator meetings) would constitute a single "investigator" latent variable. This measurement structure didn't work well, so the variables were broken out as individual contributors to study startup performance.
The means, standard deviations, and correlations with study startup performance are shown in Table 2. Three findings are particularly telling. First, the performance indicators are fairly low. The average performance across all of the measures was 6.1—lower than what we expected given all of the attention and effort given to assessing clinical trial performance in recent years. Second, there was wide variation in the performances. The average standard deviation was 2.28, meaning that about 2/3 of the performance scores fell between 3.8 and 8.4 and the other 1/3 of the scores fell outside these bounds. The third finding was that all of the drivers had a positive correlation with study startup performance. While this is not surprising, given that all of the expert respondents in the interviews described these as being important to performance, but these positive correlations take on more significance in light of the results of our results.
Contrary to the results of the interviews and correlations, not all of the drivers had a positive and significant impact on startup performance. Of the seven drivers tested, two were substantial and significant (operational timelines and project manager performance), two were significant but less substantive (personnel aligned with proposal and adhering to timelines for recruiting investigators), two were significant and negative (ability to identify qualified investigators and investigator meetings), and one was insignificant (CRA performance). These drivers are illustrated in Figure 2.
The explanation for the differences in the results of the correlations and statistical model is that they are different analytical techniques. The correlations only consider the individual drivers in isolation—the isolated effect of that variable on performance—without considering the presence of the other variables. The statistical model estimates the relationship between each of the predictors and study startup performance at the average levels of all the other variables. Adhering to operational timelines, for example, has an estimated coefficient of .45. This means that a 1-unit (10%) increase in meeting operational timelines yields a 4.5% increase in study startup at the average of project-manager performance, recruiting investigators, etc. As a result, this statistical model better describes the environment in which managers must make decisions about clinical trials.
Adhering to operational timelines had the greatest impact on study startup performance. The timelines that drove this effect were the project and ops plans, study-specific convention, forms and documents, and the regulatory submissions. Adhering to the timelines for the feasibility assessment and the IT system setup did not have a significant impact on performance. The factors that are important in adhering to operational timelines are illustrated in Figure 3.
Project manager performance was another substantial and significant predictor of study startup performance. Both the knowledge of the project manager and their interpersonal skills drove this positive impact on performance. Project manager knowledge was assessed by asking respondents their perceptions about the project manager's general knowledge of conducting clinical trials (coeff= 10.9, sig= 2.07, p= .04), specific knowledge about the details of this trial (coeff= 15.1, sig= 2.14, p= .04), and their knowledge of GCP and regulations (coeff= 10.8, sig= 1.96, p= .05). Project manager interpersonal skills were assessed by their ability to provide timely and effective communication (coeff= 16.9, sig= 2.53, p= .02), collaboration skills (coeff= 19.0, sig= 2.69, p= .01), proactive problem solving (coeff= 21.3, sig= 2.77, p= .01), and ability to recommend effective solutions for the trial (coeff= 20.5, sig= 2.66, p= .01).
Aligning personnel to what was promised in the proposal (coeff= .24, sig= 2.05, p= .04) and adhering to the timeline for recruiting investigators (coeff= .22, sig= 2.59, p= .01) were significant and positive predictors of study startup performance. We originally modeled all of the investigator variables (identifying, recruiting, and meetings) as a single (formative) latent variable. Instead, we found that each of these variables exerted independent effects on startup performance that were best modeled independently.
Surprisingly, the other investigator variables had negative influences on study startup performance. The ability to identify qualified investigators (coeff= -.18, sig= 2.35, p= .02) had negative and significant effects on study startup performance. This means that for every 1-unit (10%) increase in the CRO's ability to identify qualified investigators, study startup performance actually declined by about 2%. Our model does not identify a reason for this unexpected finding, but we speculate that the reason might be that an overdeveloped capacity for identifying investigators might inhibit a more collaborative search utilizing the resources of the sponsor and vendor.
The negative effects from investigator meetings (coeff= -.16, sig= 2.13, p= .04) is perhaps more obvious. For clinically active investigators, the meeting is likely to be perceived as intrusive and comes with additional administrative burdens. Negative perceptions increase as more time and effort are devoted to these investigator meetings. Finally, CRA performance had insignificant effects on startup performance.
While everyone involved in a clinical trial wants a positive start to a study, the startup phase has the worst performance compared to any of the other stages of a trial. In this report, we have found that it is possible to isolate those startup drivers (personnel aligned with proposal, operational timelines, adhering to timelines for recruiting investigators, and project manager performance) that are positively associated with study startup performance from those that have no effect (CRA performance) or negative effects on performance (ability to identify qualified investigators and investigator meetings).
We are not arguing that clinical study teams should not try to identify qualified investigators or hold investigator meetings. Those activities are necessary, and the experience of our manager interviews and the positive correlations confirm that. When it comes to improving startup performance, however, the four positive variables outlined previously will change startup performance. This finding is important, because executives in the real world must make decisions at the margin. In a world with limited time and resources available to monitor clinical trials, companies must decide which variables they should track in order to know how their clinical study team is performing or how a vendor is performing on a contract.
Is there really a need for additional metrics like those described in this study, when there are already dozens of operational metrics commonly in use? We believe that the metrics described in this paper are a necessary complement to operational metrics, which often lack validity. Imagine that a clinical study manager finds that it took 42 days to recruit investigators. Does that mean that the study team is performing well or performing poorly? The answer is that "it depends"—on the particular type of study or the panel of available investigators, the presence of other large trials in that area, etc. In isolation, it is hard to know what a number like "42" means. Even if there are benchmarks in place, they will only tell you about average performance in similar trials and not whether those were high performing trials. It is critical, therefore, to have performance metrics like those described in this study in order to understand the meaning of operational metrics.
For CROs, who must decide where spend the next dollar to improve performance, the results of this report allow them to prioritize their investments to where they will have the maximal effects and avoid investments that will have insignificant or negative effects. Based on the results of this study, these performance investments can even be fine-tuned to maximize the return. For example, improving operational timelines (coeff= .45) by 10% will have twice the effect on performance compared to improving the timelines for recruiting investigators (coeff= .22).
In the future, regulators will want to know that sponsors have some rational monitoring function in place.5 The results of this research provide a scientific and validated approach to monitoring study startup.
Michael J Howley, PA-C, PhD, is Associate Clinical Professor, LeBow College of Business, Drexel University, email: [email protected]. Peter Malamis, MBA, is CEO, CRO Analytics, LLC, email: [email protected].
1. M. Howley and P. Malamis, "The Key Drivers of Study Startup Performance," Clinical Business Expo 2012 Conference, September 19, 2012.
2. "Guidance for Industry Oversight of Clinical Investigations—A Risk-Based Approach to Monitoring, Draft Guidance," downloaded on May 5, 2013 at http://www.fda.gov/downloads/Drugs/.../Guidances/UCM269919.pdf. Released on August 2011.
3. J. Toth-Allen, "Building Quality Into Clinical Trials—An FDA Perspective," downloaded from on May 13, 2013 at http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/SmallBusinessAssistance/UCM303954.pdf. Presented on May 14, 2012.
4. K. Getz, "Ominous Clouds Over Outsourcing," Applied Clinical Trials, 19(9), 28-30, (2010).
5. J. Wechsler, "Clinical Trials Face New Enforcement Models," Applied Clinical Trials, 19(6), 22-23, (2010).