Cross-sectional study observed responses to cancer care questions generated by artificial intelligence.
A cross-sectional study recently published on JAMA Network Open analyzed the quality of responses from an artificial intelligence (AI) large language model (LLM) in radiation oncology. The LLM’s answers were compared to established sources to determine how accurate and readable they were.1
“AI LLMs demonstrate potential in simulating human-like dialogue. Their efficacy in accurate patient-clinician communication within radiation oncology has yet to be explored,” the study authors wrote.
The study authors utilized questions from websites affiliated with the National Cancer Institute and the Radiological Society of North America. These questions were used as queries for an AI LLM, ChatGPT version 3.5 to prompt LLM-generated responses. Three radiation oncologists and three radiation physicists ranked the responses for factual correctness, completeness, and conciseness, which were compared with answers found online from experts.
Of the 115 radiation oncology questions retrieved, the LLM performed the same or better in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with the expert answers.
“To our knowledge, this study is one of the first to provide both domain-specific and domain-agnostic metrics to quantitatively evaluate ChatGPT-generated responses in radiation oncology,” the authors wrote. “We have shown via both sets of metrics that the LLM yielded responses comparable with those provided by human experts via online resources and were similar, and in some cases better, than answers provided by the relevant professional bodies on the internet. Overall, the responses provided by the LLM were complete and accurate.”
While the LLM performed well in answering the questions, it did so at a higher reading level than the experts. According to the authors, the mean difference was six grade levels for the category of general radiation oncology answers. However, there was a smaller mean difference of two grade levels for modality-specific and subsite-specific answers.
“The LLM generated more complex responses to the general radiation oncology questions, with higher mean syllable and word counts, and a higher mean lexicon score. These scores between expert and the LLM responses were similar for the modality-specific and site-specific answers,” the authors added.
The study had limitations, the first of which being that the high reading level of the responses may present an obstacle to patients. According to the authors, American Medical Association and National Institutes of Health recommendations for patient education resources are to be written between third grade and seventh grade reading levels. There was not one LLM response that met these criteria. However, the authors suggested that future research could be done with the LLM’s ability to generate tailored responses through specific prompts. Techniques using multiple prompts may provide a more simplified response.
Second, variations in question phrasing could change the LLM’s responses. Patients may have varying levels of comfortability, background, and/or language proficiency with using technology, such as AI. Additionally, it should be noted that the LLM is still an experimental model and responses may change over time as the technology evolves.
“This cross-sectional study found that the LLM provided mainly highly accurate and complete responses in a similar format to virtual communications between a patient and clinicians in a radiation oncology clinical environment,” the authors concluded. “Accordingly, these results suggest that the LLM has the potential to be used as an alternative to current online resources.”
1. Yalamanchili A, Sengupta B, Song J, et al. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Netw Open. 2024;7(4):e244630. doi:10.1001/jamanetworkopen.2024.463
Behind the Buzz: Why Clinical Research Leaders Flock to SCOPE Summit
February 7th 2025In this episode, we meet with Micah Lieberman, Executive Conference Director for SCOPE Summit (Summit for Clinical Ops Executives) at Cambridge Innovation Institute. We will dive deep into the critical role of collaboration within the clinical research ecosystem. How do we bring together diverse stakeholders—sponsors, CROs, clinical trial tech innovators, suppliers, patients, sites, advocacy organizations, investors, and non-profits—to share best practices in trial design, program planning, innovation, and clinical operations? We’ll explore why it’s vital for thought leaders to step beyond their own organizations and learn from others, exchanging ideas that drive advancements in clinical research. Additionally, we’ll discuss the pivotal role of scientific conferences like SCOPE Summit in fostering these essential connections and collaborations, helping shape the future of clinical trials. Join us as we uncover how collective wisdom and cross-industry partnerships are transforming the landscape of clinical research.
Reaching Diverse Patient Populations With Personalized Treatment Methods
January 20th 2025Daejin Abidoye, head of solid tumors, oncology development, AbbVie, discusses a number of topics around diversity in clinical research including industry’s greatest challenges in reaching diverse patient populations, personalized treatment methods, recruitment strategies, and more.
POETYK PsA-2 Trial Shows Efficacy of Sotyktu as an Oral Therapy for Psoriatic Arthritis
March 11th 2025Sotyktu (deucravacitinib) demonstrated significant efficacy in improving psoriatic arthritis symptoms compared to placebo in the Phase III POETYK PsA-2 trial, with a well-tolerated safety profile.