Applied Clinical Trials
The need for biopharma companies to equip data managers with the training and resources necessary to capitalize on new digital health tools.
When data sources are stored and managed separately, it is difficult to see patterns and reconcile differences. Each source creates and stores data in a different format, with a different schema. Mapping data across sources is labor intensive and introduces traceability challenges. Data managers use data science to ensure data from diverse sources are gathered, harmonized, and formatted for the research scientists to analyze. Organizations that want to analyze big data should equip data managers with the technology and training necessary to ensure the new data is usable.
Quality issues that slip through data management impact downstream users such as medical monitors, potentially the most expensive resource in our organizations. These are precious resources, and the financial impact of their decisions is substantial. Poor data quality increases the likelihood of inaccurate decisions, and both false positives and false negatives can cost a pharmaceutical company millions of dollars.1
Data management is largely manual today, but that is beginning to change. At the 2019 Society for Clinical Data Management (SCDM) Annual Conference, leading organizations discussed technologies to improve efficiency, such as machine learning (ML) to help automate mapping into the study data tabulation model (SDTM) format required by FDA and robotic process automation to help code incoming data. Equipping data managers with tools that assist in cleaning and harmonizing data is the most cost-effective means to supply downstream stakeholders with critical information faster.
Currently, data managers spend a significant portion of their day on relatively menial and manual tasks, with a heavy focus on managing electronic data capture (EDC) data. Technologies can streamline, automate, and even obviate many of those tasks and help data managers provide the organization with greater visibility and insights into the data.
Lotus Clinical Research, a CRO and research site network specializing in pain management, recently invested in training for one of its data managers and quickly reaped the benefit. “I highly recommend investing in a reporting power-user. We have a new EDC platform and our reporting guru created reports and dashboards that automated hours of work previously spent on manual tracking and reporting,” said Jeanne Strain, vice president of data services, Lotus Clinical Research. “Sharing those reports with colleagues on other trials has saved time across the organization and gives sponsors faster visibility into their data.”
There are numerous examples of how real-time visibility into patient data can help an organization run more effective and efficient clinical trials.
Clean, trustworthy data is a prerequisite to extracting those insights. Getting clean data to downstream decision-makers faster results in higher-quality outcomes.
Artificial Intelligence (AI), ML, sensors, and wearables each hold significant promise to help design more effective trials, improve patient safety, and identify which patients benefit from a specific investigational product.
However, these technologies have thus far failed to live up to their hype. A major impediment to the success of any AI or ML project is the heterogeneity and discord in their data.
When Vas Narasimhan, CEO of Novartis, was asked about AI and ML, he spoke first about the challenge of getting clean data to work with. “The first thing we’ve learned is the importance of having outstanding data to actually base your machine learning on,” said Narasimhan. “In our own shop, we’ve been working on a few big projects, and we’ve had to spend most of the time just cleaning the data sets before you can even run the algorithm. That’s taken us years just to clean the datasets. I think people underestimate how little clean data there is out there, and how hard it is to clean and link the data.”2
The promise of AI will remain out of reach until organizations find a sustainable and repeatable means to manage and harmonize data from disparate sources. A critical component will be data managers applying the practice of data science to master the idiosyncrasies of biologic and real-world data. Clean data allows greater confidence in patterns and ability to predict outcomes. Once achieved, organizations can affordably and productively apply AI to produce the much-anticipated benefits.
Case report forms (CRFs) in EDC systems, which for years have been the primary source and store of clinical data, can no longer be the center of a data manager’s universe. Industry leaders at the SCDM 2018 Leadership Forum estimated that less than 30% of the data volume comes from the EDC. In most organizations, the EDC is the sole system designed to manage clinical data. According to a 2017 report by Tufts Center for the Study of Drug Development, most organizations (77%) reported difficulty loading external data sources into their EDC and 66% attributed those difficulties to EDC system limitations and integration issues.
Organizations ready to invest in their clinical data infrastructure should consider three main technologies:
1) A clinical data platform. Data managers will work more efficiently with a purpose-built platform to aggregate, clean, and normalize data. While a dedicated platform for managing all data sources was historically a luxury, they could now be considered core. Two of the most important steps for data management are selecting the appropriate data aggregation platform and learning to use the query and reporting capabilities provided. Capabilities that automate or eliminate manual data cleaning tasks will free the data manager’s time, while advanced reporting equips them to deliver high-value insights.
2)Scripting and visualization. Visualization tools help you look at data and extract meaning from what the data is showing. Usability is key to enabling data managers to move quickly and explore different aspects of a problem. Visualizations help you explore different ideas and formulate the right questions to ask.
3) Augmented intelligence. Augmented intelligence applications such as ML, natural language processing, and robotic process automation are valuable tools for a data scientist and should be considered in partnership with the data management team, not as a replacement of the team. Data scientists play an important role in training the learning algorithms and are the natural partners enabling these systems to deliver value.
Pragmatically, what does this mean for data managers today? The following are four specific recommendations for how data managers can educate and position themselves to help their organization leverage today’s data, and enable data managers to contribute as trusted advisors.
Challenges with data quality were the third most-cited barrier to completing clinical trials in 20183 and data management is becoming even more difficult with each new source. And yet, innovation in the tools and training to support data managers has been stagnant. To truly capitalize on AI and RWD, biopharmaceutical companies must invest in the data science skills and resources of their data management teams.
Richard Young, Vice President of Strategy, Veeva Vault CDMS; email: richard.young@veeva.com
Reaching Diverse Patient Populations With Personalized Treatment Methods
January 20th 2025Daejin Abidoye, head of solid tumors, oncology development, AbbVie, discusses a number of topics around diversity in clinical research including industry’s greatest challenges in reaching diverse patient populations, personalized treatment methods, recruitment strategies, and more.
STEP UP Trial Shows Semaglutide 7.2 mg Achieves Superior Weight Loss vs. 2.4 mg, Placebo
January 17th 2025Semaglutide 7.2 mg significantly outperformed semaglutide 2.4 mg and placebo with a 20.7% average reduction in weight and a comparable safety and tolerability profile, further establishing its efficacy in obesity treatment.