A powerful new machine learning tool developed by experts at the University of California San Diego (UCSD) has discovered a pattern of DNA mutations that links bladder cancer to tobacco smoking. The AI machine, called the SigProfilerExtractor, utilized de novo extraction to find the link.1
The tool analyzed mutational sequences found in participants. These sequences are specific patterns of mutations generated when a person is subjected to an environmental exposure, which then alters their DNA. The extraction found strong epidemiological ties between bladder cancer and tobacco smoking, whereas the previous link to tobacco smoking was only known to exist with lung cancer.
In a news release about the study, senior author Ludmil Alexandrov, PhD, explained the way the machine works by comparing it to conversations at a party. “You have multiple groups of people talking all around you, and you are only interested in hearing certain individuals speaking. Our tool essentially helps you do that, but with cancer genetic data. You have multiple people around the world exposed to different environmental mutagens, and some of those exposures are leaving imprints on their genomes. This tool goes through all those data to pick out what are the processes that cause the mutations.” Alexandrov is a professor of bioengineering and cellular and molecular medicine at UCSD.2
For the study, the tool looked at 23,827 sequenced human cancers, consisting of 4643 whole genome– and 19,184 whole exome–sequenced cancers. After extraction, the tool found 4 novel mutational signatures, including one linking bladder cancer to tobacco smoking. The study team found that the mutational signature is different from the one found in lung cancer. It is also found in tobacco smokers who have not developed bladder cancer. The signature was not found in the bladder tissue of nonsmokers
“What this signature tells us is that certain mutations in your DNA are due to exposure to tobacco smoke,” said study author Marcos Díaz-Gay, PhD, postdoctoral scholar in cellular and molecular medicine at UCSD. “It doesn’t necessarily mean that you have cancer. But the more you smoke, the more mutations accumulate in your cells, and the more you increase your risk for developing cancer.”
The new machine was also benchmarked against 13 other existing bioinformatics tools, all of which analyzed mutational signatures from 80,000 synthetic cancer samples. The tool developed by the team at UCSD detected 20% to 50% more true positive signatures than the others and had a false positive (FP) rate 5 times below the others. The SigProfilerExtractor also did well with datasets containing high levels of random noise, whereas the other tools did not.
This work may help investigators find other links between environmental factors and cancer, which could lead to more customized treatment for patients. Moving forward, the team hopes that the tool could be utilized at a more individual level to profile patients for bladder cancer. For that to happen, they would need to create a more user-friendly interface for investigators to use rather than a tool that relies on bioinformatics expertise.
1. Islam SMA, Wu Y, Díaz-Gay M, et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. Published online September 23, 2022. doi:10.1016/j.xgen.2022.100179
2. Mutational signatures linking bladder cancer and tobacco smoking found with new AI tool. News release. University of California San Diego. September 26, 2022. Accessed October 7, 2022. https://www.newswise.com/articles/mutational-signature-linking-bladder-cancer-and-tobacco-smoking-found-with-new-ai-tool?sc=mwhr&xy=10016681