Commentary|Videos|May 26, 2026

Why RAG-based AI models may be more sufficient for patient education

Archan Khandekar, MD, highlights key findings from a study evaluating LLM architectures for answering patient-facing urologic questions.

At the 2026 American Urological Association Annual Meeting in Washington, DC, Archan Khandekar, MD, discussed a study evaluating the safety and reliability of large language models (LLMs) for answering patient-facing urologic questions, highlighting the advantages of retrieval-augmented generation (RAG) architectures grounded in guideline-based evidence.1 Khandekar is an assistant professor of urologic oncology at the Desai Sethi Urology Institute in Miami, Florida.

Khandekar explained that the study was designed to address the increase in patients relying on consumer-facing artificial intelligence (AI) platforms for health information, despite uncertainty surrounding the accuracy and reliability of those responses. The investigators evaluated 3 distinct domains of AI-generated answers: guideline fidelity, verifiability, and contextual accuracy. They used 5 standardized high-stakes urologic scenarios based on AUA and National Comprehensive Cancer Network guidelines. The study compared a general conversational LLM with 2 RAG-based systems. According to the study findings, the RAG-Evidence and RAG-Hybrid models substantially outperformed the general-purpose LLM, achieving scores of 25/25 and 24/25, respectively, compared with 14/25 for the Gen-LLM.

Khandekar addressed the major concern that general-purpose AI systems “hallucinate” incorrect medical information while presenting answers with high confidence. Khandekar pointed to an example in which the Gen-LLM confused TURP for benign prostatic hyperplasia with TURBT for urothelial carcinoma, underscoring the potential risk of misleading patients seeking medical guidance online. He noted that these errors can be inconsistent and difficult to predict, as the same question posed multiple times may generate different incorrect responses.

Khandekar emphasized that RAG-based architectures may offer a safer framework for future clinical and patient education applications because they restrict AI systems to curated, verifiable sources such as clinical guidelines, textbooks, or institutional databases. Rather than allowing unrestricted access to constantly changing internet information, these systems retrieve answers directly from predefined medical information stored locally within health systems, potentially minimizing both hallucinations and privacy concerns related to protected health information. He suggested that as computational power and local deployment capabilities continue to improve, RAG-based systems could become increasingly important for clinical decision support and patient education, particularly in ensuring that AI-generated recommendations remain guideline-concordant and traceable to established evidence.

REFERENCE

1. Khandekar A, Sharma N, Freitas P, et al. Evaluating the fidelity of large language model outputs for patient-facing urologic queries: a comparative analysis of general vs. retrieval-augmented generation (RAG) architectures. J Urol. 2026;215(5S):e1163. doi:10.1097/01.JU.0001191624.06470.8f.27