In an interview with Nico Saraceno at DPHARM 2024, Munther Baara, VP strategy and innovation, EDETEK touches on the features an effective AI/ML model can provide and how they benefit data management in trials.
ACT: You have an upcoming session on “Transforming Clinical Trials: From Reactive to Predictive.” What are some of the main points that will be discussed during the session?
Baara: I'll be talking about the metaphor of travelers. Back in the days—it depends how old you are—you might remember when people used to travel, they'll have a map with them. There'll be a passenger next to them with the highlighter trying to map their route, and if they miss an exit, it will probably take them another hour to come back and then we progressed to the MapQuest, where we printed the map, and then we end up with the GPS, and then we end up with a smartphone, and then became a gamification with ways and now we're getting into the predictive analytics to basically predict how you get to do your next day and probably telling you what you should be doing, etc. I see this like in our clinical trials, there's some companies that are still using the paper, navigating with the highlighter, there's people at the GPS, and once a new study with different endpoints, it becomes harder and harder for them, basically, to navigate and be able to see the big picture.
What I'm going to talk about is very simple concept. We call it the four Cs—connect, collect, conform, and consume. What we've done, we’ve built this whole architecture to allow you to connect to the different data providers at the site level or at the study level, so that gives you the ultimate flexibility. You could use different ADC vendors, central labs, ePRO, new digital endpoints, genomics, data, etc., it doesn't matter. The idea behind it is those data providers will immediately be basically pushing data into what we call our digital data pipeline. Then, once the data starts flowing in, you'll see the data coming in and then conforming to the standard of your liking, whether it's for submission or for data review, and then provide you the final piece, which is what we call the consume piece. That's what's so powerful, because now you have to think about it like, you come to a building, you expect light, electricity, and water, right? You don't think about it. Once we do these hooks into the building, into your clinical trial, into our digital data pipeline, things start flowing, and now you're seeing everything in one place, one version of the truth. There are no delays because typically study managers tell me my study was great—green for six, nine months, the first DMC data monitoring committee meeting, I need to do the cleanup or interim analysis—that's when things start basically going south. It went immediately from yellow to red, and I see I have the wrong patients in the study, etc. We solve this problem from day one. That's the key. Study starts. Data starts flowing in. You're seeing whether you are enrolling the right patients into the study or not.
ACT: From a technological standpoint, what’s required for the development of more accurate AI/ML models?
Baara: That's a good question. I struggle a lot of the time when people talk about AI and machine learning, and I feel like they have not done the foundational pieces that's needed in place. Of course, you need AI and modeling and machine learning, but you’ve got to think about: you need to be able to curate the data, aggregate the data, and you need to make sure the data is validated with high quality, so not only curate, but cleansing of the data. The 4 Cs concept that I described does exactly that because we bring data in real time, we basically run all these validation checks, and we have what we call flexible validation rules that you can do through and point and click. Zero programming involved. It eliminates SAS coding, et cetera, especially when you cross data across different data providers. I can look, for example, like somebody says in the exposure data, we adjusted the treatment, and then I look in adverse events, there is no reason for adjustment, et cetera. I can basically cross reference a lot of data. I can cross reference lab collection with the actual lab results, the ePRO data, with how the patient reported safety versus SAE, so I can do that all in coding and then we can apply algorithms as well to monitor the quality of the data. As the data comes in, you immediately point out where there's weak points or what needs to be addressed and tackled, and you’re cleaning the data as you go. That gives you the foundation to have excellent AI and machine learning, basically models in place in order to respond to this. Without it, your model is going to be weak. You're going to be missing a lot of pieces, and not only that—which I forgot to talk about—is the metadata about the data. In order for machine learning and AI to increase the accuracy of it, you need to have the data about the data and that's what our platform has. We can take raw data, transform it to any target data, like from the EDC collection to the CDISC SDTM format that's ready for submission for the regulatory agencies, so that together is needed in order to be able really to get into the next level.