
- Vol 53 No 09
- Volume 53
- Issue 09
AI and prostate cancer: Balancing hope and caution
Key Takeaways
- AI tools like PathomIQ can predict prostate cancer metastasis by analyzing whole-slide images, independent of Gleason scores and gene expression profiling.
- Current AI applications in prostate cancer are mainly adjunctive, with limited clinical use, highlighting the need for further validation and integration.
Christopher Weight, MD, MS, discusses the current state and directions of AI and prostate cancer.
The potential applications of artificial intelligence (AI) are vast and span many fields, including urologic oncology. In this interview, Christopher Weight, MD, MS, discusses the current state and directions of AI and prostate cancer. He provides an overview of a recent validation study of the PathomIQ AI tool for prostate cancer prognosis.1 The study analyzed whole-slide prostate cancer images from 344 men to determine whether AI-derived risk scores—independent of Gleason score and Decipher gene expression profiling—could predict metastasis. The algorithm segmented slide images into patches, assessing architecture, epithelium, and tumor microenvironment to generate a 0 to 1 risk score. Fifteen of 16 patients who developed metastases were flagged as high risk.
Weight noted that AI’s current clinical use in prostate cancer is mainly limited to research and adjunctive interpretation, with few validated tools in routine practice. AI shows promise in addressing variability and reproducibility issues in multiparametric MRI and prostate-specific membrane antigen (PSMA) PET interpretation by standardizing segmentation and grading. However, fully replacing existing clinical systems such as Prostate Imaging Reporting and Data System (PI-RADS) remains a future goal.
AI could enhance risk stratification beyond traditional models (eg, those of the National Comprehensive Cancer Network [NCCN], the European Association of Urology [EAU]), but validation across diverse populations is essential to avoid bias. Weight suggested AI may eventually supplant Gleason grading by detecting histologic patterns beyond current scoring, such as “unfavorable pathology” identified at the Cleveland Clinic.
Promising AI applications include computer vision analysis of MRI and biopsy slides to better identify low-risk patients suitable for active surveillance and potentially predicting gene expression from histology without consuming tissue.
Current limitations include the lack of prospective, randomized controlled trials and the risk of bias if algorithms are trained on narrow patient populations. Weight emphasized caution, critical evaluation, and broad, multiethnic training data before widespread clinical adoption. Although optimistic about AI’s potential to improve patient selection and reduce overtreatment, he stressed the importance of rigorous validation to ensure accuracy, reproducibility, and equitable application. Weight is the Vice Chair over research and the Center Director for urologic oncology at the Cleveland Clinic in Ohio.
Urology Times: Could you provide an overview of the PathomIQ test and Cleveland Clinic’s recent research?
Weight: This is a validation study of previous work…done by a group in New York. This was a study [evaluating] 344 men who had prostate cancer, and it was trying to take a look at [whether there were] details in their images that [were] independent of Gleason score. And actually, this cohort also had a Decipher expression profile, and [the investigators wanted to assess whether the test] could predict with reliability, down the road, what might happen to these patients after treatment for their prostate cancer.The way the PathomIQ study works is you take a whole-slide image of the prostate cancer, and then it divides it into little patches of the slide, and then it evaluates each patch. There are several things that it does: looking at the architecture, the epithelium, the way the cells are lined up, [and] the tumor microenvironment. It creates, essentially, a risk from 0 to 1, and then that risk is how concerning it is that there will be some kind of recurrence. It’s trained on actual events. It was originally developed…to identify people who had developed metastasis, which is a really good outcome to train to. That’s what we really want to know. I think that’s a better outcome than…training to Gleason score, which we know has some issues. It’s training to solid outcomes, like [whether] a patient develops metastasis down the road.
This was the validation study, so the model has not seen any of the Cleveland Clinic patients.... It went through and created a risk score purely based on the computer’s interpretation of how risky that whole-slide image was, so an H&E [hematoxylin and eosin] stain of a sample of the prostate cancer, and then we compared that in multivariate models with known predictors such as Gleason score and gene expression profile, et cetera. The take-home was that it was very good at identifying those who are going to go on to develop metastasis. There were 16 [patients] in the cohort, so not a lot of events where you can do multivariable studies. You need a lot of events if you’re going to put multiple predictors into the model, but it did find that of the 16 that developed metastasis down the road, 15…had a high-risk profile, identified by the PathomIQ AI algorithm…. So it was really reassuring that most of the patients who would eventually go on to develop metastatic disease were classified as high risk in our study.
Urology Times: How is AI currently being integrated into the diagnosis and management of prostate cancer in clinical practice?
Weight: The current usage of AI is more in meetings [and] research, and the current number of people…using a validated AI tool is very limited. The promise is great, but right now, there’s more hoopla than…AI algorithms that are really making a difference. There have been many studies. There have been studies that have tried to replicate the Gleason scoring system, and those have shown some promise. There have been studies…trying to help interpret and read multiparametric MRIs, and there are tools that are starting to be implemented as adjuncts. There has been no replacing, so far, of any physicians in terms of radiologists, urologists, radiation oncologists, pathologists, etc. But [the tools] are starting to be integrated into adjuncts, although it is [more] the promise of AI, rather than the actual implementation in the current landscape, that is really changing care.
Urology Times: What role does AI play in interpreting imaging modalities such as multiparametric MRI or PSMA PET in prostate cancer detection?
Weight: Interpreting any medical image has always been fraught with a lot of challenges, particularly multiparametric MRI. We know there’s a lot of variability in the quality of the MRI, so not all MRIs yield the same quality of images for interpretation. We know there are dramatic differences; for example, if you get a 1.5-Tesla MRI vs a 3-Tesla MRI. We know there are dramatic differences in the way radiologists interpret, so there is interobserver variability. If you have 2 radiologists looking at the same film, their agreement is only around 0.7, with 1 being perfect agreement. That’s somewhat shocking, so you might get a different result going from one radiologist to another radiologist. Finally, even more shocking to many people is [that] even their own radiologists don’t agree with themselves perfectly. If they read a film 6 months later, not knowing that they’d already read it, they only have an agreement statistic around 0.8, so it’s better than radiologist to radiologist, but it’s not perfect either.
It gets to this idea of [having] 2 challenges. One, quality is an issue. Two, reproducibility is an issue, and this is where I think AI can…potentially help us, because it can reproducibly grade films. There’s been a lot of work in this space because of the challenges that we already face. Furthermore, we know that expert MRI readers—radiologists who read a lot of prostate MRIs—tend to do better and more reliably interpret these MRIs, and it correlates more reliably with pathology. So we’d really like to standardize this and make it available to everyone. There has been a lot of effort. Usually, the first step is just getting an MRI to recognize exactly where the prostate is, so you put in an MRI that it’s never seen before and try to get it to outline the prostate. That work has been pretty reliably accomplished now, and that can be done fairly reliably in an automated way, even when there are variations in anatomy, patient size, or other things that may distort the image, like hip replacement. Hip replacement can cause a lot of distortion in the image. It might cause so much distortion [that] not even a human can outline it. Segmentation has been a problem [where] a lot of work has been done, and that has been fairly closely solved, meaning it can be done reliably. The next step is segmenting within the prostate. Can you identify the central zone or the transition zone from the peripheral zone?That also can be done fairly reliably by many different algorithms, which is really useful for setting the groundwork for what we really want, which is for the MRI to be able to be interpreted to some degree that it helps us find how risky this man’s prostate is in terms of prostate cancer.
The current clinical system…we use is called the PI-RADS…, and this…is used to help interpret prostate MRIs. This is a clinically derived system where radiologists will give a score from 1 to 5, with 1 being the lowest risk and 5 being the highest risk. We know that this scoring system is better than what we’ve had in the past, but it still has a lot of problems. We know that scores in certain parts of the prostate don’t yield the same risk of prostate cancer down the road. We also know there’s some interobserver variability, but there has been a lot of work on trying to replicate that system, and a lot of headway has been made.
The holy grail would be to get an algorithm that doesn’t try to replicate systems that are problematic, that have trouble with reliability and interobserver variability, but [that] would [identify] patients who have prostate cancer that is biologically lethal or significant, and ignor[e] those [with] low-risk prostate cancer. That might be done with something called deep learning or computer vision, where the algorithm just makes its own conclusions based on a bunch of representations of patients who go on to have aggressive prostate cancer. The challenge with that, of course, is we don’t know how it’s making its decisions, and that black box decision-making can sometimes be tricky to manage. [However], if we have a lot of reliability and if it correlates with other things that we already know, we start to gain confidence that that can aid us in our decision-making. That one is a little further off, but certainly we see signals in the tools…we already have that that might be a possibility in the not-too-distant future.
Urology Times: How reliable are AI-driven biopsy decision tools or risk stratification models compared with traditional clinical guidelines, such as the NCCN or EAU ones for risk groups?
Weight: This is an area where there’s a lot of research going on as well. There are some publications that indicate that they might be a little bit more reliable, but they have not been validated in multiple different populations and countries and ethnicities. This is one area where we have a lot of understanding from these traditional risk calculators, and they have guided clinical practice for a long time. We know over time that these clinically derived prediction models do have some drift, and they may not be as effective over time [as] when they were initially developed. It underscores that we have to continually evaluate these. I think there are some exciting options that might come into play in the near future, but we do need to make sure they are validated in multiple different populations. We do know that…traditional things, like logistic regression models, can encode bias if they’re only developed in a certain type of man. For example, we know a lot of our studies are done on White males who live in the United States. They might not apply quite as well [elsewhere], and so it’s something that we need to continue to study and evaluate to make sure that [for] both our traditional models, but also as we implement these new AI models, which become more scalable, [we] can…move them across into clinical practice much faster than historically we’ve been able to; we just need to make sure that they’re applying equally well in different populations of men and different ethnicities,
Urology Times: Can AI help improve accuracy or consistency in Gleason grading or histopathologic interpretation?Is there evidence to support this?
Weight: I don’t know if it will ever get perfect with Gleason grading, because Gleason grading, although it’s our gold standard, has its own issues of reliability, reproducibility, and, quite honestly, correlation with outcomes that are important. It’s been a great tool that has served us well for a long time, and it still will continue to serve us well for a long time, but I don’t think AI should try to replicate Gleason grading because of its challenges. I think it will be supplanted eventually by some of these tools that are being worked on right now, where there are details in that histologic slide that we are not capturing in the Gleason score. We’ve done some work on this already at Cleveland Clinic. Dr [Jesse] McKenney and Dr [Jane] Nguyen, our GU pathologists, have developed a term called unfavorable pathology. It’s a different set of classification of H&E slides that, independent from Gleason score, seems to be predictive, and I think they have identified this through their experience. I think AI will be able to replicate that and maybe even find other patterns that aren’t incorporated into the Gleason score that will provide us even more valuable and personalized information when we have a patient in front of us and we’re trying to decide on how to treat this patient and what their course might look like.
Urology Times: What are the most promising AI applications in prostate cancer screening, especially in addressing overdiagnosis or overtreatment?
Weight: I think 2 areas that I see would be really promising. One would be computer vision analysis of the multiparametric MRI. There seem to be a lot of data there. There’s some indication that if the MRI is truly normal—and that’s a hard definition to come by, and not all radiologists agree on what is truly normal—…it’s very unlikely to have a very aggressive prostate cancer. There are always exceptions to these rules, but I think using computer vision to standardize the understanding and the reading of these MRIs might help us to identify patients who are maybe candidates for active surveillance, that they’re very unlikely to have a cancer that’s going to progress in the near future, and reassure us. I think we will see those same sorts of techniques used for the biopsy samples as well. And when we see certain signals on the biopsy samples, I think those are going to be quite reassuring if we don’t see any of the, for example, dangerous signals on that biopsy sample.
And then finally, there’s some really interesting work that’s going on, [such as] can we use 1 AI for gene expression from the biopsy sample?But similar to what we found in the PathomIQ study, the nice thing about computer vision studies is that you don’t have to destroy any tissue. When you do a gene expression profile, you have to take a little bit of that tissue, and it gets used up in the study in creating the gene expression. There’s some really exciting information that maybe you can predict gene expression just by the way the cells look on the H&E slide. That’s an area that I think is really exciting, that might be able to give us some added personalization, where we can predict the gene expression just based on how the images look. I think a combination of these tools is really going to help us classify patients appropriately, and I would really see us watching more patients rather than treating more, because I think we’ll have more confidence that the patient sitting in front of us has a very low risk for progressing to any significant prostate cancer in the immediate future. And then, as we continue to watch, that can always change, but we will have more confidence that the tools we have identify patients [who] are at risk and the patients who are safe to continue on active surveillance.
Urology Times: What are the current limitations or risks associated with implementing AI tools in urologic oncology, and how should clinicians critically assess these tools?
Weight: The main limitations [are that] we have almost no randomized control trials, so almost all the data are retrospective. You take a cohort of patients from the past, you know their outcomes, and you use some sort of computer vision AI tool to predict what’s going to happen to that patient. But we don’t have interventional studies where you say, “I’m going to use AI in a prospective way and see if it really makes a meaningful difference to patients.” We have almost no studies in that space. That’s where I think there’s a lot of hype but not a lot of solid evidence that tells us it’s actually going to make our lives better.
I think we should [be very cautious]. You can see a lot of things online [along the lines of], “This is going to break medicine. This is going to eliminate physicians. This is going to disrupt the medical field as we know it.” But I think we…have yet to see these…prospective, randomized controlled trials where we implement AI and it really makes a substantial difference. I think that is incumbent on us as researchers to make sure we do that, but also, as a general clinician, be wary. These AI tools are subject to all the same problems [that] all our past research is subject to. They’re subject to bias. They’re subject to being overly trained in a population that doesn’t apply to the population of men in front of you. They’re subject to a lack of reproducibility and a lack of consistency, so although I’m in this field, I’m excited about it, [and] I’m doing research in this area, I think we have to [be very cautious about just applying] them blindly. And bias, I think, is…very likely going to be baked into these algorithms unless we make a concerted effort to have the algorithms trained on multiethnic, multinational, multi-institutional cohorts. If it’s just trained in 1 hospital, like this recent one that we talked about earlier, the 344 men at Cleveland Clinic, it works for 344 men at Cleveland Clinic. It may not work for men who live in LA; it may not work for men who live in Mexico. It may not…work for men who are in Asia. We need to have caution. We need to be continually evaluating these and making sure they don’t perpetuate biases and/or potentially even [cause] harm to our patients, because we might be trying to apply them to a population where it doesn’t work very well.
REFERENCE
1. Fay M, Lio RS, Lone ZM, et al.Artificial intelligence–based digital histologic classifier for prostate cancer risk stratification: independent blinded validation in patients treated with radical prostatectomy. JCO Clin Cancer Inform. 2025:9:e2400292. doi:10.1200/CCI-24-00292
Articles in this issue
Newsletter
Stay current with the latest urology news and practice-changing insights — sign up now for the essential updates every urologist needs.


















