Maximizing the impact of data is only possible when strong fundamental data management capabilities are in place.
Data is more important than ever in today’s challenging R&D environment, but as most lab directors, research directors, and senior strategy leaders will attest, strategic use of R&D data can be as challenging as it is promising. For example, take the opportunity that artificial intelligence and machine learning (AI/ML) present for organizations to leverage decades of proprietary R&D data to help guide and accelerate new initiatives.
Experts tasked with bringing an AI/ML vision to fruition know that success hinges on first addressing a myriad of challenges, from identifying realistic use cases for applying AI/ML within the current research environment to readying the multimodal data needed to train models. While this type of groundwork is not often fodder for headlines, it is essential to success.
A similar narrative can be told about almost any opportunity of strategically using R&D data. It’s never simply a matter of making a decision and flipping a switch. Rather, maximizing the impact of R&D data involves an incredible amount of work behind the scenes to first establish the precursory data infrastructure needed to capture and process vast volumes of diverse data being generated by teams using many different tools and technologies.
This often necessitates sourcing new technologies and trusted partners to help establish strong data processes from the earliest steps of data collection and automation through to querying, modeling, and analysis.
In its semi-annual State of Science survey, LINUS questioned nearly 400 scientists representing multiple geographies, organization types, application areas, and levels of responsibility. Results show that scientists are adapting to current scientific and economic pressures by turning to key levers such as automation and partnerships.
Teams are using automationto meet productivity goals despite budget cuts and layoffs, and as an initial step toward supporting reproducibility, guiding innovation, and setting a strong data foundation for AI/ML. LINUS survey respondents indicated that the number one technology they plan to purchase in 2024 is automated systems, which they see as a vital stepping stone toward another key priority: AI/ML.
Similarly, teams are looking externally to access additional resources and data needed to strategically innovate, adopt technologies such as AI/ML, and further optimize productivity. For example, the LINUS survey results shows that academia and biopharma/pharma alike are aiming to enhance their research by turning to external collaborations for help with wide-ranging challenges, such as ensuring the reproducibility of experimental results or performing complex analyses in areas such as translational research, bioinformatics, and clinical research.
Survey results show that 43% of respondents will be prioritizing new collaborations and partnerships in 2024. By introducing high quality externally sourced data, organizations are looking to enrich their data pool, leading to more robust predictive and potentially generative models.
However, maximizing the impact of data—such as through strategic collaboration or use of technologies such as AI/ML—is only possible when strong fundamental data management capabilities are in place. For many organizations, key obstacles stand in the way that fall into the categories of the “three V’s of big data,” including:
Without self-service, permission-controlled tools to query across all available data and curate richly contextualized datasets, an organization’s data will remain FAIR (findable, accessible, interoperable, and reusable) in theory, but not in practice. Many teams face these types of obstacles through no fault of their own, but rather because of the way technology has gradually evolved.
Scientists have always been keen to adopt new technologies and data collection solutions that enable them to do their research better. For example, 46% of LINUS survey respondents cited “adopting new techniques to acquire new types of data” as their top scientific work priority in 2024.
However, new R&D technologies are typically rolled out independently over time. As a result, many organizations struggle with a convoluted mishmash of different platforms, automation, and data-collection systems. This makes it very difficult for scientists to actually garner true intelligence from the R&D data they’re generating.
To reconcile this, many organizations are now looking for solutions that unite these disparate technologies and interconnect their systems of record. They want to help their scientists better analyze their data within the context of other data being generated across different research projects, ultimately helping them understand it more deeply and apply it in a more meaningful way.
With an integrated data fabric, researchers can more easily contextualize their data, collaborate, and attain the insights they need to push forward their work in finding life-changing life sciences discoveries.
About the Author
Allister Campbell, VP Head of Science and Technology at Dotmatics.
2 Commerce Drive
Cranbury, NJ 08512