Amidst difficulty in attracting underrepresented populations, industry is searching for ways to include more diverse datasets.
The clinical trials world, despite strategy after plan after project, is barely making headway to attract diverse, underrepresented patient populations.1 Overall, African Americans, Asians, Latinx, and the elderly, are still saying, ‘not me.’
Trial sponsors and investigators know well why they hear that ‘not me’: Long-embedded distrust of the medical system; refusal to be randomized to a control arm; no convenient way to get to the trial site; and for some groups, even fears of technology and needles.2 On top of these, growing trial size and complexity are aggravating clinical trial costs and trial enrollment times.
Some sponsors are trying a new method to ensure they have underrepresented populations in their trials. Instead of warm bodies, they are pursuing digitized bodies, those patients whose health information lives in prior clinical trials, EHRs, claims data, prescriptions, urgent care locations, and so on. They are turning to data companies to create their control arms by finding digitized patients to match real ones in the treatment arm.
There is another method for creating a control arm, dubbed the digital twin concept: Statisticians take one patient’s age and another’s zip code and somebody else’s diagnosis, all because someone in the treatment arm has that age, zip, and diagnosis. Henriette Coetzer, MD, chief medical officer, recruitment and real-world evidence, CVS Health Clinical Trial Services, said this concept needs to be validated before it can be used in a regulatory-standard trial. Others interviewed for this article said they do not use twins either.
Real-world data, on the other hand, comes from records of real patients and are not adjusted or created, said C.K. Wang, MD, chief medical officer, COTA.
But how to refer to these control arms? In this relatively new field, terminology is not universal—to the point that for this article, different experts used different terminology to describe the same thing. The word that seems to be gumming up the lexical works is synthetic.
CVS, in correspondence, defined a control arm using synthetic information as real data collected outside of the clinical trial system to match participants in the treatment arm. Medidata describes its synthetic control arms as formed by carefully selecting patients from historical clinical trials to match the demographic and disease characteristics of the patients treated with the new investigational product.3
Thorlund et al4 describes synthetic data like this: Synthetic controls are defined as cohorts of patients from external data and adjusted using any of a variety of statistical methodologies.
“The term synthetic data now refers to an entirely new group of data...data that is made up based on algorithms and do not correspond to an actual patient,” said Wang, in correspondence.
For the purposes of this article, an external control arm contains the data of real, de-identified patients.
While the gains to sponsors are obvious, time and money likely saved, so are the possible—maybe probable—drawbacks. Apple to apple comparisons—Macintosh to Macintosh—might likely be more like Macintosh to Cortland.
Furthermore, historical medical data are not representative of all populations, and can be incomplete. The human expertise required to adjust data that have been rooted in systemic, structural, and cultural bias might not be around, or exist, at the moment of adjustment.
The main concern, said Alex John London, PhD, Clara L West, professor of ethics and philosophy, and director, Center for Ethics and Policy, Carnegie Mellon University, is “the problems that stakeholders are often trying to overcome through the use of AI or other computational models are the result of problems that operate at a larger social level.”
Alexa Berk King, PhD, chief scientific officer, real-world evidence, CVS Health Clinical Trial Services, said FDA’s opinion, overall, has been clear. While synthetic control arms “are appealing and compelling, there is a lot of methodological hesitancy [regarding] what happens on the front and back end” of real-world data collection and analysis.
Users and creators of synthetic data have different business models. Some must buy the data, others, like CVS Health and Optum Insights, do not. CVS de-identifies its own digital diamond mine: retail pharmacy, insurance information, lab results, minute-clinic visits, “and all of the data elements that go alongside that,” said Coetzer. Optum Insights issues licenses to its users. CVS provides analytic services for its customers.
Medidata AI creates its arms from a pool of 30,000 clinical trials representing nine million people. COTA accesses healthcare systems and providers for clients’ strictly oncology-focused external control arms.
Wang said a trial’s diversity bar has always been low. Historically, clinical trial sponsors have listed few denominators—sex and age, ethnicity, and race.
Consider research from the American Study of Kidney Disease and Hypertension, conducted 27 years ago, that showed African Americans with diagnosed hypertensive renal disease, a common condition among African Americans,5 were poorer and more unemployed than their peers in the general population.6 Later work showed that genetic variants, like the Apolipoprotein L1 (APOL1) gene, can exist among different populations.7
Recently, IQVIA detailed how Black representation in trials has decreased: in 2013, Black participation was 12.3%. In 2021, it was 6.5%. However, Hispanic representation rose 2.5%, to 9.9% in 2017.8
CDER detailed who was missing in its Drug Trial Snapshots Summary Report 2021.9 Of the approved 50 novel therapies, among the 11 heart, blood, kidney, endocrine disease therapies. four listed N/A under headings for various populations. In a medication for risk reduction of kidney and heart complications in chronic kidney disease associated with type 2 diabetes, 5% were African American.
FDA, required by the 21st Century Cures Act to come up with a diversification plan, recently issued draft guidance on the use of external control arms in a clinical trial.10 The guidance discussed the Macintosh vs Cortland issues, like differences in data collection times and possible changes in standards of care between the control arm’s original trial and the experimental treatment arm.
“(The) FDA is monitoring how the research community is exploring the use of synthetic data and will stand ready to provide regulatory clarity as needed,” said an agency spokesperson.
The agency already has approved at least one medication based on trial results that included an external control arm; so has EMA. The National Institute for Health and Excellence (NICE), in a review, found that of 489 applications, 22 used external data. Of these, 13 used published RCT data, and six used observational data. More than half of the applications came in the last two years.4
Pressure is coming from elsewhere; BioEthics International plans to score pharma companies on how diverse their trials are.11
Overall, the number of synthetic data creators is growing. In 2021, 67 synthetic data vendors, of all types, were in business. As of October 2022, there were 100 of them, according to medium.com.12
It seems like new companies are being formed every day, said Wang, adding there is an immense need for data in the healthcare ecosystem.
Examples of the pool sizes include the following:
The creation of synthetic data, said London in email correspondence, can be a complicated and delicate process and so some may do a better job than others. But, these efforts will only be as good as the knowledge that stakeholders bring to the table.
Berk said the CVS data are scrutinized for missingness and outliers, like out-of-range values lab data. If that lab data are outside existing parameters, “It is not our place to go in and make interpretations.” If that missing piece cannot be statistically fixed, it is tossed. Berk estimated that at least 10% of standard data is not analyzable. She noted that her group does not sell, loan, or license data itself. “We use our internal teams to generate insights and evidence from our data, and those insights are what we deliver. We are not selling it out the back door.”
Reaching the diversity goals for a synthetic control arm takes some doing, considering the lack of the underrepresented in so many health care data streams. Wang said that depending on a clinical trial’s inclusion and exclusion criteria, it is not uncommon to end up with only a limited number of usable records from starting with thousands to get down to hundreds of records. Synthesizing is tricky: “You have to find a cohort of real world or synthetic patients subjects need to very closely to nearly match the relevant and critical inclusion-exclusion criteria of a trial.”
COTA uses real-world data and oversees the entire process from abstraction and processing of raw data to final delivery of a curated dataset. Clients, which include pharma and payers, can trace the data to its origins and observe any data transformations, Wang said.
Coetzer argues that overlapping the myriad digital layers of patient information will unearth those patients who can be included in a synthetic arm. Knowing the physician, the diagnosis, the codes, the procedures, medications, tests and the results, “these clusters give us a lot of information around the general specifics of the individual.” The data analysis results, she continued, will tell the team what is missing, and how that missing information should be filled. Coetzer said CVS data has been included in at least 100 publications.
Furthermore, Coetzer said CVS retail stores are so widely distributed that 85% of the US population lives within a 10-minute drive of one. That distribution, she said, allows CVS’s data universe a provider-agnostic view of its patient population, and to include real-world observations for validation purposes. CVS can match its synthetic control patients to the CDC’s social vulnerability index, finding patients who have lacked access to health care.
As Wang said, synthesis can get tricky.
At the front end, said Berk, besides the exclusion and inclusion criteria, investigators must make sure that head-to-head comparisons are controlled for when there is a time bias. Once the trial is going, are the real-world observations from the synthetic control arm being compared to those in the treatment arm? “On the back end, we have to account for [these differences] when possible,” she said, and that is done by drawing inference from the control’s real-world data. In the real world, she said, tumor response isn’t assessed every couple of weeks and blood isn’t drawn weekly.
As for the ethics of creating these arms, those interviewed disagreed. The issue, said Wang, is not whether the use of real-world or synthetic data is ethical; it is the incurred cost to patients and society resulting from delayed treatment approval from delayed clinical trial accrual. From his experience, at least 50% of his cancer patients approached to participate in a clinical trial have refused if they could not be guaranteed a place in the treatment arm. “Delaying or withholding potentially effective therapy, especially in a life-threatening situation, is a huge ethical dilemma.”
Christine Bahls is a freelance writer for medical, clinical trials, and pharma information.
In Focus: Addressing the Health Literacy Roadblock in Patient Recruitment
Published: November 15th 2024 | Updated: November 15th 2024With universal adoption of health literacy best practices slow going over the years, advocates are redefining the term to encompass much more of what health-related communication requires beyond simply words.
Vabysmo Shows Significant Vision Improvement for Diabetic Macular Edema in Underrepresented Patients
October 22nd 2024Phase IV ELEVATUM trial results show that one year of treatment with Vabysmo significantly improved vision in underrepresented racial and ethnic groups with diabetic macular edema, supporting the drug's efficacy and safety across diverse populations.