Foundational issues must be addressed to advance best practices.
For over a decade, we’ve discussed the potential of machine learning (ML) in clinical research to objectively gather and analyze data, optimize trial design, and accelerate drug development. While the opportunities of these technologies get a lot of buzz, there is still a long way to go when it comes to proving they can deliver on their promise and ensuring their development is sustainable long-term. We now find ourselves at a crossroad to improve confidence in ML among pharmaceutical sponsors and clinicians, while finding alternative ways to keep pace with the data-hungry nature of these algorithms.
Three key trends will direct the future of ML: regulatory guidance, an emphasis on model traceability as a means to build trust, and new data aggregation and analysis approaches that may help make ML innovation more practical and cost-effective.
Until recently, federal oversight over ML’s development has been limited, with developers defining best practices based on their own experience. While leaving the development of scientifically sound models up to the discretion of the data scientists has helped spur innovation, it has also led to faulty or biased models. These not only taint the reputation of ML overall, but also can have serious consequences for a patient’s care. To address the challenges that come with autonomy, FDA, Health Canada, and the UK’s MHRA have jointly identified ten guiding principles to inform the development of Good Machine Learning Practice (GMLP). From data-specific guidance such as ensuring data sets are representative of the intended patient population, to identifying opportunities for improved cross-industry collaboration, these guidelines aim to promote safe, effective, and high-quality ML, while also considering its complex and iterative nature.
GMLP is an important first step to encouraging the adoption of proven, quality practices, and its evolution will be important to watch. Because these are suggestions rather than required standards, ML companies still hold the reins and can decide to what extent these guidelines influence their solutions. Ultimately, abiding by these principles and baking them into all the behind-the-scenes work of building a model should be nonnegotiable, especially as trust in these solutions continues to sway.
Clinicians and pharmaceutical sponsors can be wary of ML’s “magical” element in which it spits out a conclusion without evidence to support it, especially when a patient’s care or the future direction of a trial is on the line. As sponsors and clinicians operate in a highly regulated environment that requires vigilant documentation and robust proof around a drug’s efficacy, increasing the transparency and traceability of ML can help drive trust and give users peace of mind.
When encouraging traceability of ML, some developers may fear this would involve giving up proprietary information about the algorithm’s code. But that is not the case. Instead, we aim to gain visibility into the controls of that algorithm, or the system built around it that determines how data is collected, how it is trained and tested, and how a specific output is generated—all while keeping intellectual property secure. One day, this could involve an “audit report” or pedigree of a model that shows the workflow of a system and confirms it was developed using best practices and for its intended patient population. By lifting the veil on ML, we can hold developers more accountable to ensure no corners are cut, while also giving end users the boost of confidence they need.
Big data analytics have dominated ML development for decades and will continue to be an important foundation, feeding algorithms large amounts of high quality, diverse data. But, as the industry aims to achieve precision medicine and advance research of rare diseases, there’s a need for data approaches that allow for more targeted analysis. The quantity of data will still be a big factor in teaching ML the basics, but there’s a growing emphasis on extracting actionable insights from smaller, more contextual datasets.
While big data is valuable for building bigger picture trends and correlations, Gartner recently predicted that 70% of organizations will shift their focus from big to “small and wide data” by 2025 to make ML less data-hungry. Wide data allows one to view disparate data from a variety of sources to come up with meaningful analysis, while small data is focused on using small, individual sets of data to draw specific, more personalized insights. Together, small and wide data allow ML developers to extract more value from the data they have available to them and target those insights to solve a specific problem. In healthcare, this approach is particularly helpful when building training sets of patient data that are big enough, which in some instances can be an impossible feat. Big data is not going away, but when combined with small and wide data, we open the door to more pragmatic AI development and more precise patient care.
ML has already made its mark on the healthcare and life sciences industries—many have reaped the benefits of more efficient operations, a better understanding of patient’s response to treatment, improved forecasting and more. The future, and reputation, of these relatively novel solutions depends on addressing these foundational issues of regulation, transparency, and optimized data approaches. By ensuring we are serving the unique needs of sponsors and clinicians for visibility, while reducing the burden on developers, we can advance best practices and innovate with purpose.
Michelle Marlborough, Chief Product Officer, AiCure, LLC
‘Hypothesis-Free’: Getting Proactive About Signal Detection
December 5th 2023Elizabeth Smalley, director of product management, data, and analytics at ArisGlobal speaks about her work at the software company in supporting the efforts of life sciences clinical and pharmacovigilance teams in signal detection.