Unlocking the full potential of artificial intelligence requires these stakeholders to ensure their data are accessible and secure.
Over the past two years, generative artificial intelligence (AI) has emerged as a top strategic priority in the life sciences industry, driving innovation across drug development and regulatory processes. With generative AI now capable of accelerating clinical documentation, regulatory submissions, and content generation, organizations are faced with the dual challenge of ensuring data and content readiness and managing the broader organizational changes required for successful generative AI implementation.
At the heart of this transformation lies the clinical study report (CSR), a cornerstone document in the drug development process. Generative AI offers the potential to automate the creation of CSRs, reducing manual workload, enhancing accuracy, and speeding up the time-to-market for new therapies. However, for organizations to fully leverage generative AI-driven automation, they must first ensure that their content ecosystems are optimized for this technology. This involves data readiness, content readiness, aligning documentation processes with regulatory requirements, and designing a content ecosystem in which each document serves a unique purpose.
Standardizing data formats is critical to ensuring accuracy, comparability, and regulatory compliance and is also the foundation for automation.
Except for patient-level datasets formatted in standardized Study Data Tabulation Model (SDTM) and Analysis Data Model (AdaM), summary-level data standards needed for CSRs vary across studies and therapeutic areas, both within companies and between companies. Guidelines to provide standardized clinical data have been available for several years now, but efforts to implement them remain ongoing. Managing different study designs and data structures requires additional steps when implementing generative AI and putting it into production.
The best practice for an organization is to assess upcoming portfolios and identify early adopting users and studies. In turn, this will help identify past studies that best represent future projects. Another option is to work on anonymized data and documents representing the future state or standards of their business.
Generative AI can now quickly draft some pharmaceutical regulatory documents. However, pharmaceutical companies will not realize the full benefits of using generative AI to automate medical regulatory document development unless they define and simplify the content ecosystem and remove extraneous effort.
Sponsors want to use generative AI to save resources, accelerate timelines, improve reviewers' experience, and enhance quality. However, these same groups often add a large volume of unnecessary information to regulatory documents, especially CSRs, leading to multiple manual reviews, revisions, and the risk of contradiction across related documents.
Therefore, sponsors need to focus on content and process readiness to realize the potential of generative AI.
This article will focus on content readiness exemplified by the CSR.
CSRs are comprehensive technical documents that present the results of individual clinical trials. They typically include a detailed account of the study's design, methodology, and outcomes, which can be conveyed through text, tables, figures, and listings. These elements work together to provide a clear and concise summary of the findings.
CSRs form the backbone of the clinical documentation required for marketing authorization applications, and regulatory reviewers rely heavily on them to evaluate the safety and efficacy of new drugs. However, CSRs are not standalone documents. They are part of a broader ecosystem of documentation that supports drug development and regulatory review. Notably, regulatory reviewers ordinarily assess the risk-benefit of new drugs and indications not based on individual CSR results but rather on the totality of data submitted with marketing authorization applications.
Regulatory guidance for CSRs is antiquated and does not account for what modern regulatory reviewers need to keep pace with medical advances. Regulatory agencies demand objective, unbiased data for a thorough assessment of the safety and efficacy of new medical products. Subjective descriptions can introduce bias and hinder informed decision-making, necessitating the removal of subjective elements to ensure regulatory compliance and evaluation.
TransCelerate modernized CSR writing with a new template that streamlines the preparation and review process in a structured format aligned with regulatory reviewers' requirements. The new template also incorporates feedback from industry experts, regulatory bodies, and other stakeholders. The template avoids redundancy by referring to appendices for detailed information about study design, operational details, and so on.
Despite introducing TransCelerate’s CSR template and technological advances, the time required to produce CSRs has not changed over the last decade. In 2013, the time required to produce a CSR from receipt of final tables, figures, and listings (TFLs) to document approval was between 8 to 15 weeks. A recent survey conducted with 5 global pharmaceutical companies showed the range to be between 6 and 15 weeks.
Authoring teams often add large volumes of unnecessary information to regulatory documents, especially CSRs, leading to multiple manual reviews, revisions, and the risk of contradiction across related documents. Generative AI produces content based on how it is trained. If trained to write the way authoring teams currently operate, it will. Automating content generation without first changing the way content is written will only accelerate the process of generating masses of content that authoring teams will then pick over for weeks or months. Authoring teams may save time producing their first draft, but those time savings will remain modest.
The first step in automating a document is defining a clear scope, the writing style, and the content detail level that aligns with regulatory guideline compliance. This forms the base from which you work on an automated solution.
To maximize the return on investment in generative AI, authoring teams must first optimize their content. Content optimization involves three primary tasks:
Objective content is neutral and fact-based. CSR authoring teams use objective content to describe results associated with a priori statistical tests a given study was powered to assess. Such text often appears in the efficacy section. This type of content is suitable for CSRs. The scientific integrity of a clinical trial hinges on the objectivity with which data are portrayed. Objective descriptions facilitate transparent and impartial reporting, shielding the CSR from individual or institutional biases and preserving the trial's scientific reputation.
Conversely, subjective content in regulatory documents often reflects authoring team members’ individual opinions, interpretations, feelings, and biases and can vary greatly between authoring team members and reviewers, as well as vary over the course of the drug development. Subjective content is used to describe results unassociated with a priori statistical tests that a given study was powered to assess. Examples include commentary about the demographic balance between treatment groups, or whether certain adverse events or abnormal laboratory results occur at similar, higher, or lower rates between treatment groups. Such text often appears in the safety section. Inconsistencies in how authoring teams make these subjective calls can easily be found both within CSRs and across CSRs. This type of content is unsuitable for CSRs.
Objective data descriptions foster consistency and comparability across different sections of the CSR and between various studies. This not only aids regulatory reviewers in their assessments but also engenders confidence in the reliability and trustworthiness of the reported results.
Content reuse has its place in document development. However, since technology has replaced the paper-based content authoring and review environment, referring to content rather than reusing it is often the most efficient way to share information authoring teams would have reused in the past.
Two scenarios exemplify inappropriate content reuse:
Authoring teams reuse protocol information in the CSR introduction, objectives, and methods sections, in many cases changing its verbs to past tense, as one would in a manuscript. While a few elements from the protocol help regulatory reviewers to understand a study’s basic design elements and population, authoring teams all too often reuse minute and specific details that were more appropriate for investigative site staff conducting the study. Once the study is over and the CSR written, such details switch their purposes from being a user manual for site staff to being reference material for regulatory reviewers. Repeating protocol details in the CSR is unnecessary because the protocol is appended to the CSR. Regulatory reviewers can access it with a hyperlink.
Rewriting protocol text for a CSR’s introduction and methods sections leads to errors and oversimplification, which increases the burden on reviewers. By referring rather than reusing, authoring teams can save days on the authoring and review process, CSRs will be shorter, and reviewers won’t have to look out for inconsistencies between the protocol and the CSR.
Only key numbers should be presented in the CSR’s text, while additional numbers and information can be found in the tables and appendices. Authoring teams often use text in CSR results sections to repeat information that tables and figures already provide. This redundancy creates two problems.
Firstly, numbers in text are more difficult to interpret than numbers in a table’s orderly rows and columns. In text, the column width of the page dictates the placement of numbers, meaning numbers a reviewer wants to compare or otherwise understand are scattered all over the page.
Secondly, placing numbers in text requires an accuracy check for transcription errors. To prepare for data accuracy checks, authors must annotate each data-related statement to a source. Independent reviewers must then verify the fidelity between the two. This process is a common bottleneck for regulatory submissions.
Referring the reviewer to data in tables and figures, particularly those the statistician has validated, will make the data easier to interpret and eliminate the need for accuracy checks.
Table 1 shows the contrast between 3 CSR sections where the authoring team eliminated subjective content while limiting text that repeats data in tables and figures. In each of these examples, the data tables remained in the CSR, which provided all the numbers.
Certainly, discontinuing the practice of repeating masses of data in text that already exist in validated data tables and figures will save time and should be acceptable to authoring teams. However, the change proposed in this paper to write CSRs without subjective content is more radical. Authoring teams accustomed to writing subjective CSRs will want to know whether regulatory reviewers will prefer and accept this practice.
The TransCelerate CSR template instructions clearly advise not using the CSR to interpret the data but rather to provide an objective summary of the results: “The CSR is to be used as a mechanism to report the outcomes of the study. Discussions such as benefit/risk interpretations are better suited for summary documents.” TransCelerate built its CSR template in adherence with ICH E3 and CORE guidance (TransCelerate - Clinical Content & Reuse Assets - Clinical Studies).
One way to assess whether regulatory reviewers will prefer more objective CSRs is to ask ex-regulatory reviewers who work as consultants. These consultants can provide invaluable insight into how they use clinical documentation during the marketing authorization application assessment process and insights into how they would ideally like to see this documentation organized and written.
One exercise worth considering is to provide ex-regulatory reviewers with two versions of a CSR:
Questions to ask the consultant could include the following:
Answers to questions such as these will prove valuable for your change management plan and messaging, which will help authoring teams adopt the objective CSR approach.
Consider obtaining consultations from ex-reviewers from FDA and EMA, and potentially other regions.
The purpose of the CSR, an objective reporting of trial data, is related to the purpose of documents putting this information into a larger context, similar to plants in an ecosystem. For the CSR, this ecosystem would consist of all the documents in the common technical document (CTD) submissions.
Part of document automation is defining not just the scope of a specific document, such as the CSR, but looking at the collection of documents as a content ecosystem and clearly defining the scope of each document to serve a distinct strategic purpose.
Designing a CTD content ecosystem in which each document serves a unique purpose can improve the content experience for regulatory reviewers while increasing workflow efficiency. Furthermore, understanding the style of reporting for each component of the CTD, be it objective, descriptive, or persuasive, will facilitate a streamlined workflow for creating these documents.
CSRs should form the data basis for CTD Module 2 documents. They should not tell the story of the clinical program but contain the data building blocks of the story.
CTD Module 2 summary documents should feature the story of the clinical program where all the evidence amassed from the individual experiments conducted during the drug’s development are tied together to form the basis of product labeling.
Let’s consider how one section of a product label is created: drug interactions. During drug development, companies conduct individual in vitro, clinical pharmacology, and population pharmacokinetic studies to assess whether the experimental drug affects the pharmacokinetics of other drugs dependent on the same enzymatic pathway. The purpose of the individual reports is not to label the drug. Rather, the reports provide discrete pieces of evidence that authoring teams use to tell a story about whether the drug has an interaction liability. Authoring teams tell that story in CTD Module 2.7.2, Summary of Clinical Pharmacology Studies. Then, in CTD Module 2.5.3, Overview of Clinical Pharmacology Studies, authoring teams write a critical analysis of the pharmacokinetic, pharmacodynamic, and in vitro data associated with that drug interaction, emphasizing any unusual results and known or potential problems, or noting the lack of these.
By assigning different purposes to individual reports, CTD Module 2 summaries, and CTD Module 2 overviews, authoring teams save time by not duplicating content. Messaging across these documents does not contract but rather builds toward a story in the CTD Module 2 summaries and a critique in the CTD Module 2 overviews. Regulatory reviewers save time during review by not having to re-read the same content over and over across many documents. They save time by knowing where to find specific levels of information in specific documents in a consistent, logical manner that follows the regulatory guidelines.
In conclusion, the deployment of generative AI at scale within the pharmaceutical industry has proven to be a strategic priority, demonstrating significant value across multiple therapeutic areas. However, implementing generative AI has challenges that must be addressed, such as data and content readiness.
For successful implementation, companies must ensure their data are accessible, secure, and machine-readable. Additionally, optimizing content by eliminating redundant information is crucial to maximize the return on investment in generative AI. By focusing on content and process readiness, pharmaceutical companies can leverage generative AI to save resources, accelerate timelines, improve reviewers' experience, and enhance the quality of regulatory documents.
Ultimately, the key to realizing generative AI's full potential lies in addressing these challenges and continuously refining the content ecosystem to meet regulatory reviewers' evolving needs.
John April, Senior Director, Adaptive Content Strategy, Global Scientific Communications, Eli Lilly and Company; Inger Ødum Nielsen, Team Lead, Content Management Team, Clinical Reporting at Novo Nordisk; and Vanessa de Langsdorff, Vice President, Customer Success at Yseop