Cross-sectional study observed responses to cancer care questions generated by artificial intelligence.
A cross-sectional study recently published on JAMA Network Open analyzed the quality of responses from an artificial intelligence (AI) large language model (LLM) in radiation oncology. The LLM’s answers were compared to established sources to determine how accurate and readable they were.1
“AI LLMs demonstrate potential in simulating human-like dialogue. Their efficacy in accurate patient-clinician communication within radiation oncology has yet to be explored,” the study authors wrote.
The study authors utilized questions from websites affiliated with the National Cancer Institute and the Radiological Society of North America. These questions were used as queries for an AI LLM, ChatGPT version 3.5 to prompt LLM-generated responses. Three radiation oncologists and three radiation physicists ranked the responses for factual correctness, completeness, and conciseness, which were compared with answers found online from experts.
Of the 115 radiation oncology questions retrieved, the LLM performed the same or better in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with the expert answers.
“To our knowledge, this study is one of the first to provide both domain-specific and domain-agnostic metrics to quantitatively evaluate ChatGPT-generated responses in radiation oncology,” the authors wrote. “We have shown via both sets of metrics that the LLM yielded responses comparable with those provided by human experts via online resources and were similar, and in some cases better, than answers provided by the relevant professional bodies on the internet. Overall, the responses provided by the LLM were complete and accurate.”
While the LLM performed well in answering the questions, it did so at a higher reading level than the experts. According to the authors, the mean difference was six grade levels for the category of general radiation oncology answers. However, there was a smaller mean difference of two grade levels for modality-specific and subsite-specific answers.
“The LLM generated more complex responses to the general radiation oncology questions, with higher mean syllable and word counts, and a higher mean lexicon score. These scores between expert and the LLM responses were similar for the modality-specific and site-specific answers,” the authors added.
The study had limitations, the first of which being that the high reading level of the responses may present an obstacle to patients. According to the authors, American Medical Association and National Institutes of Health recommendations for patient education resources are to be written between third grade and seventh grade reading levels. There was not one LLM response that met these criteria. However, the authors suggested that future research could be done with the LLM’s ability to generate tailored responses through specific prompts. Techniques using multiple prompts may provide a more simplified response.
Second, variations in question phrasing could change the LLM’s responses. Patients may have varying levels of comfortability, background, and/or language proficiency with using technology, such as AI. Additionally, it should be noted that the LLM is still an experimental model and responses may change over time as the technology evolves.
“This cross-sectional study found that the LLM provided mainly highly accurate and complete responses in a similar format to virtual communications between a patient and clinicians in a radiation oncology clinical environment,” the authors concluded. “Accordingly, these results suggest that the LLM has the potential to be used as an alternative to current online resources.”
1. Yalamanchili A, Sengupta B, Song J, et al. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Netw Open. 2024;7(4):e244630. doi:10.1001/jamanetworkopen.2024.463
Driving Diversity with the Integrated Research Model
October 16th 2024Ashley Moultrie, CCRP, senior director, DEI & community engagement, Javara discusses current trends and challenges with achieving greater diversity in clinical trials, how integrated research organizations are bringing care directly to patients, and more.
AI in Clinical Trials: A Long, But Promising Road Ahead
May 29th 2024Stephen Pyke, chief clinical data and digital officer, Parexel, discusses how AI can be used in clinical trials to streamline operational processes, the importance of collaboration and data sharing in advancing the use of technology, and more.
The Rise of Predictive Engagement Tools in Clinical Trials
November 22nd 2024Patient attrition can be a significant barrier to the success of a randomized controlled trial (RCT). Today, with the help of AI-powered predictive engagement tools, clinical study managers are finding ways to proactively reduce attrition rates in RCTs, and increase the effectiveness of their trial. In this guide, we look at the role AI-powered patient engagement tools play in clinical research, from the problems they’re being used to solve to the areas and indications in which they’re being deployed.