AI chatbot responses lack readability, actionability regarding urologic cancers

Clarke,Hannah;

News

Article

September 1, 2023

Urology Times Journal

Vol 51 No 11

Volume51

Issue 11

AI chatbot responses lack readability, actionability regarding urologic cancers

Author(s):

Hannah Clarke

“Our findings show that AI chatbots provided accurate information with little misinformation. However, the information was provided at a college reading level and with little actionability," says Abdo E. Kabarriti, MD, FACS.

Artificial intelligence (AI) chatbots generally provide accurate information for queries regarding urological cancers, but the information tends to lack in readability, understandability, and actionability, according to findings recently published in European Urology.¹

For the study, the investigators assessed the quality of information from 4 AI chatbots: ChatGPT, Perplexity, Chat Sonic, and Microsoft Bing AI.

“This article is the first of its kind to examine how AI chatbots perform with regards to cancer-related inquiries. This is of great importance as more people turn to AI to obtain information,” said senior author Abdo E. Kabarriti, MD, FACS, in correspondence with Urology Times®. Kabarriti is an assistant professor at SUNY Downstate Health Sciences University in Brooklyn, New York.

For the study, the investigators assessed the quality of information from 4 AI chatbots (ChatGPT, Perplexity, Chat Sonic, and Microsoft Bing AI) on prostate, bladder, kidney, and testicular cancer-related queries. The top 5 search queries for each cancer from 2021 to 2023 according to Google Trends were used and analyzed for quality, understandability, actionability, misinformation, and readability.

Findings showed moderate to high scores regarding the quality of information, with a median DISCERN score of 4 out of 5 (range, 2 to 5). Responses specifically related to cancer treatment tended to have lower DISCERN scores among all the chatbots, given a lack of information regarding treatment mechanisms, risks, benefits, and their effects on quality of life. Among all 4 chatbots, ChatGPT demonstrated the lowest median DISCERN score of 3.

The responses for all chatbots were not found to contain much misinformation as demonstrated by a median Likert score of 1 out of 5, where a lower score indicates less misinformation. The most cited sources of information for 3 of the 4 chatbots were The Mayo Clinic, Cleveland Clinic, and the American Cancer Society. ChatGPT was the only chatbot that did not cite any sources.

Further, understandability was only found to be moderate, with a median score of 66.7% (range, 44.4% to 90.9%). The information also tended to be presented at a difficult reading level according to Flesch-Kincaid readability levels, with a median score of 11.8 (range, 5.7 to 38.1). Chatbot responses tended to use medical terminology, and responses were often brief (median word count of 100), which the authors noted “may not explain information in a comprehensive manner.”

Actionability was also found to be moderate to poor across all 4 chatbots, with a median PEMAT-P score of 40% (range 0 to 40%). Responses on prostate cancer queries showed the lowest median understandability and actionability scores compared with the other urologic cancers included in the study.

Kabarriti concluded, “Our findings show that AI chatbots provided accurate information with little misinformation. However, the information was provided at a college reading level and with little actionability. AI chatbots will undoubtedly play a big role in the future, and our study suggests that they should be used to complement the patient-clinician discussions.”

Reference

1. Musheyev D, Pan A, Loeb S, Kabarriti AE. How well do artificial intelligence chatbots respond to the top search queries about urological malignancies? Eur Urol. Published online August 9, 2023. Accessed September 1, 2023. doi:10.1016/j.eururo.2023.07.004

Download Issue PDF

Articles in this issue