"The main thing that we were surprised by was that at this point in time, ChatGPT couldn't give us consistent results," says Athena Barrett.
In this video, Athena Barrett and Kristin G. Baldea, MD, highlight key findings from the publication, “Utilization of ChatGPT for appraising letters of recommendation in urology residency applications: Ready for prime time?” Barrett is a medical student at Loyola University Chicago Stritch School of Medicine, and Baldea is an assistant professor of surgery and urology at Loyola Medicine in Chicago, Illinois.
Barrett: The main thing that we were surprised by was that at this point in time, ChatGPT couldn't give us consistent results. We would feed ChatGPT 1 letter of recommendation for 1 applicant written by the same person and ask it to score it 0 to 100 [and] give us some adjectives. And we would start a new chat, give it the same exact prompt and the same letter, and we found that there were discrepancies with what ChatGPT could give us. With that, it's hard to publish the scientific article if the data isn't reproducible. But I think that in and of itself is interesting to know what ChatGPT is capable of now, because I believe in the future it will be capable of being more consistent and being used for data analysis.
Baldea: The other thing that I was surprised about is I read about the idea that ChatGPT can have hallucinations, which is the term for when it just makes things up. But I was surprised at the degree to which it did that with what seemed like a somewhat simple task. If we fed it multiple letters at once and asked ChatGPT to compare them, it was unable to do that, and actually mixed them together and made up applicants that didn't submit to it. So, it really was not useful in that way.
This transcription has been edited for clarity.