Site selection is one of the most important and at the same time challenging problems in clinical trials planning. Poor site selection may cause enrollment delays, resource waste on low or zero enrollment, and even potentially compromise trial results.
Site selection is one of the most important and at the same time challenging problems in clinical trials planning. Poor site selection may cause enrollment delays, resource waste on low or zero enrollment, and even potentially compromise trial results.
Site selection is a complex process, which includes sites identification, sites assessment, sites validation. and selection of a set of sites aligned with study goals and limited resources.
Many factors affect the site selection process. Some of them are generic, others are trial specific (Figure 1). Some factors are qualitative and subjective, such as “Experience and qualifications of Principal Investigator” or “Staff turnover.” The value of each factor can be evaluated qualitatively (e.g., excellent – 10, the worst – 1). Or it can be quantitative, such as “Planned patients’ enrollment” or “Projected enrollment rate.” The factors could be extracted from historical databases, questionnaires, etc. Some companies use data mining techniques for site identification. Examples of clinical trial-related factors are “Trial budget,” “Enrollment target,” etc. A variety of business rules are also related to the entire trial, e.g., “If a country is selected, then minimum two sites per country should be selected,” or “Priority site”-forced site prioritization.
Traditionally, site selection efforts are focused on a selection of “best” clinical sites based on:
Site rankings according to a criteria has limitations. For example, site selection based on sorting of predicted enrollment rates does not take into account parameters like cost per patient, site capacity, or “soft” attributes such as experience of PI, facility quality, etc. The value of site selection based on extrapolation of historical performance may be limited due to high site turnover [Getz, 2018] or a limited or lack of historical data for new sites. Site selection based on sorting of ranking index also does not take into account budgeting and other constraints, as well as study goals and business rules.
Often, the site selection process mistakenly identifies with only site identification and feasibility assessment. After identifying the “best” sites and evaluating them, it is assumed that the best sites will be selected somehow from a feasible set of “best” sites.
Often, a feasible set of sites is identified based on an informal process which may include, but not be limited to:
More advanced techniques (AI, data mining) could be applied for large sites’ databases but not transferrable to all sites.
Usually, all feasible sites are divided into three major tiers. Their advantages and disadvantages are presented in Table 1.
Sites could “migrate” from tier to tier depending on most recent performance assessments.
A high-level overview of the traditional site selection process is presented in Figure 2 and described below.
After site evaluation, all feasible sites are ranked according to their value[2] or other criteria.
Let’s consider an example illustrating the site selection approach based on sites’ rankings.
Case study: Contingency planning for Phase III CNS global clinical trial
For a Phase III trial, an initial set of sites was already selected. However, some sites canceled their participation due to various reasons. Therefore, in order to meet study goals (enrollment rate, number of patients, and limited budget), there was a need to select additional sites. Twenty five new feasible sites were identified.
Site selection needed to meet the following goals:
The technique included several steps:
Ranking algorithm for site selection
Sites parameters, such as number of randomized patients, projected enrollment rate, and patient-related costs, are accumulating until their values meet or exceed study goals. For example, cumulative number of patients, cumulative enrollment rate, and cumulative site costs are obtained by adding site #2 data to the site #6 data (Table 3). It means that cumulative number of patients = 10 (site #6) + 10 (site #2) = 20. The same applies to projected enrollment rate for sites #2 and #6 = 0.52 (site #6) + 0.24 (site #2) = 0.76, etc.
Unfortunately, often study goals cannot be met simultaneously. For example, sites-related budget was met by adding site #4, but enrollment target goal was not met. Therefore, according to the ranking algorithm, sites #10 and #11 have to be added to meet the enrollment target goal. It means that budget was exceeded by $147,500, or by 12%.
If sites are ranked according to their enrollment rate, budget has to be increased by 8%.[3]
This process is time- and labor-consuming and does not guarantee optimal site selection. Adding more sites does not solve the problem, because it is associated with increased cost of a clinical trial beyond the budget and inclusion of riskier sites into a pool of feasible sites.
Is there a better site selection solution aligned with study goals and within the budget?
This paper presents a portfolio approach to site selection similar to selection of financial portfolios or portfolio of projects. For site selection, this approach was formulated in [1].
Portfolio approach to site selection
Portfolio approach to site selection means that instead of selecting individual sites, clinical trial planners need to select a portfolio of sites based on advanced analytical models, where the goal of site selection is to maximize the overall value of a portfolio of sites, and to align it with clinical trial goals and limited resources. As shown in [2], the most effective approach to portfolio selection is based on the mathematical optimization model. The model replaces the loop, including steps B (except sites evaluation), C, D, E, with advanced modeling algorithms, automating site selection aligned with study goals and resources.
Optimal site selection
In the context of decision-making, optimization means determining the most favorable solution, outcome, or course of action from a set of alternatives that satisfies all constraints and dependencies based on the mathematical optimization model.
In order to optimize site selection, an optimization model was developed. The model was formulated as a mixed integer programming (MIP) model. The MIP models are a subset of the linear programming (LP) [3] models, where some variables are binary (0, 1). LP models find the globally optimal value (e.g., total value of selected sites) of a linear function of a certain number of variables, given a set of linear constraints on these variables (equalities or inequalities).
The model for optimal site selection includes four components:
A. Decision variables
Xi = (0 or 1). Their value will be defined automatically. If Xi = 0, i-th site is not selected. If Xi = 1, i-th site is selected.
B. Parameters
Estimates for each site, such as cost/patient, projected enrollment rate, and others.
C. Constraints
D. Criteria
Single (e.g., maximum value of sites’ portfolios), or multiple criteria (maximum enrollment target, minimum budget, etc.) can be used.
Modeling experiments
The model (Site Selection Optimizer) uses the same data as presented in Tables 2 and 3. The optimization algorithm found a better solution than the one based on the ranking algorithm. It automatically selects a portfolio of sites aligned with study goals and budget (Table 4). At the same time, in order to reach study goals, ranking algorithm requires ~12% bigger budget and more sites (17 - ranking vs. 16 - optimization).
Selected portfolios of sites using ranking vs. optimization (baseline scenario) are presented in Table 5.
Optimization results may not look intuitive. For example, site #2, despite its high score/value, was not selected due to high costs/patient, high number of patients, and total costs/site.[4] Also, site #2 was not selected because of a “knapsack effect.” That means that it’s harder “to pack” a large item (site capacity = 10pt) vs. several small “items” (most sites capacity =5 pts). Sites #9 and #16 were selected in the optimization model despite their relatively low score, because cost/patient is low and enrollment is high enough.
The model validation table is presented in Table 6.
Could the solution presented in Table 6 be obtained without the model? Potentially, yes. However, more than two million portfolios have to be analyzed, and the probability of picking up an optimal portfolio of sites ~1/ (2*106) is similar to winning the lottery. Therefore, in a reasonable timeframe, only suboptimal portfolios could be generated. At the same time, for the case study, the model generated an optimal portfolio of sites in two seconds.
Model advantages:
The model allows the calculation of multiple metrics related to multiple parameter allocations across counties, such as clinical trials costs, clinical sites, patients, and enrollment ratem, as presented on Figure 4.
‘What–if’ scenarios: Exploring the capabilities of this model
Scenario #1. Forced selection of site #2.
In order to meet study goals, forced selection of site #2 (high value, but low enrollment rate (see Table 6) modifies baseline solution. For example, sites #6, #9, #14, and #25 were not selected. At the same time, sites #10, #19, #21 were selected. Also, this scenario requires higher budget ($1.34 million vs. $1.26 million in baseline scenario) and a larger number of sites (16 in baseline scenario vs. 17 in scenario #1).
Scenario #2. Forced removal of sites #6 and #15.
At the last minute, sites #6 and #15 decided not to participate in a clinical trial. The model recalculated the sites’ portfolios. In this case, the number of sites was increased by two (from 16 – baseline scenario), to 18 (scenario #2). Budget was increased from $1.26 million to $1.30 million.
Model enhancements
Very often, it is hard to meet all requirements by using a single criteria optimization model. In some cases, multi-criteria optimization could be more effective in instances of multiple conflicting goals. The model modification requires the introduction of multiple criteria (in our case, “Maximum Value,” “Minimum Budget,” and “Maximum Enrollment”). Each criteria has a weight in %. Sum of weights = 100%.
Five scenarios were compared against the baseline optimal site selection scenario (Table 9). The first three scenarios (S1, S2, and S3) are equivalent to a single criteria optimization (weight of a criteria = 100%), scenario S4 is associated with highest weight=50% on criteria #2 - “Minimum Budget”, 30% weight on criteria #1 - “Maximum Value”, and 20% weight on criteria #3 - “Maximum Enrollment.” Scenario S5 is associated with equal weight to all three criteria – 33.33%. The model generates different portfolios of sites for each scenario presented in Table 10.
It was noticed that there are sites selected in all scenarios, e.g., sites #5, #7, #11, etc., and sites not selected in all scenarios, e.g., sites #17, #18, etc.
One of the challenging aspects in site selection is uncertainty in enrollment predictions. At the same time, deterministic site selection has to be made.
In order to address this issue, the model was modified. Three enrollment scenarios (favorable, realistic, and conservative) and corresponding subjective probabilities for each site were considered instead of the deterministic enrollment rate in the baseline model (Figure 5).
The stochastic model may generate different solutions than the deterministic one. For example, site #5 in Table 11 was selected in the deterministic model, and not selected in the stochastic one. Inclusion of a site into a portfolio depends on other parameters involved in the optimization.
Key points
Vadim Paluy, MD, is Clinical Research Medical Director, Novartis; Vladimir Shnaydman, PhD, is President, ORBee Consulting
References
[1] More advanced techniques could be used if information is available.
[2] Some companies rank sites based on their forecasted enrollment rate. Enrollment rate is very important, but this approach does not take into account other aspects of site selection, like cost per patient, budget, etc.
[3] Please, contact Vladimir Shnaydman (vladimir.shnaydman@orbeeconsulting.com) for details
[4] Data was modified in order to highlight model capabilities.
[5] Especially for Phase III studies where a large number of sites has to be selected.