Embracing and understanding new and emerging molecular techniques will improve patient outcomes.
Have you ever wondered what exactly happens to a patient sample when it disappears into a laboratory’s ether? Suddenly, a report filled with results magically shows up in your patient’s file, but what happens during that unknown period? The answer is: a lot. That sample goes through a complex molecular journey.
This article will walk you through the history of genetic sequencing and polymerase chain reaction (PCR), where they are today, touch on microarrays, explain some standard terminology and the questions those terms are asking, plus describe the clinical laboratory’s workflow.
DNA and Sanger sequencing
In today’s molecular testing world, there are 2 very common applications: sequencing and quantitative polymerase chain reaction (qPCR). The first method of sequencing, known as Sanger sequencing, was founded by Frederick Sanger, PhD,
1 of 2 people to win a Nobel Prize twice in the same category. He is considered a pioneer of sequencing DNA for his work with Walter Gilbert, PhD.1 Prior to his work, most research was done on RNA, which is single-stranded and was easily manipulated with RNase enzymes that cut at very specific nucleotide sequences.
With their discovery of DNA, Watson and Crick noted that nucleotides form the building blocks and that adenine binds with thymine and cytosine binds with guanine to form a base pair (bp). These nucleotides are called deoxynucleotides, meaning they are missing a hydroxyl group. In sequencing, in order to interrogate and “read” the genes and DNA of interest, the base pairs are read and identified. Sanger sequencing decided to throw a wrench into a small portion of those nucleotides and make them dideoxynucleotides, removing another hydroxyl group. When those dideoxynucleotides get added, sequencing immediately stops. Imagine you’re building a Lego tower out of regular-sized 2 by 4 pieces but mixed in about 10% of irregular, flat pieces where there aren’t bumps on top to add any more pieces. That’s how dideoxynucleotides work. They stop your Lego tower from growing.
Sanger sequencing was so revolutionary and important at the time, and for the next 20 years, it was used in the Human Genome Project.2 By overlapping the sequences, the human genome was built about 500 to 600 bp at a time, with Sanger sequencing being a critical aspect of the entire project. The method is still used to this day for very fast and inexpensive sequencing results and difficult-to-sequence portions of the genome.
As with any technology, companies are looking to improve the speed, accuracy, and cost of an assay. There have been a few iterations of the next step beyond Sanger, but the one that has become dominant is called next-generation sequencing (NGS), formerly known as massively parallel sequencing. Nowadays, there are 2 versions of sequencing: short reads (Illumina) and long reads (Pacific Biosciences and Oxford Nanopore). Short reads go up to 600 bp together in 1 run, whereas long reads can go beyond 10,000 bp at once, with some reads in the millions of bp. To put this in perspective, the BRCA2 protein is approximately 3000 amino acids in length. By definition, an amino acid is coded for by 3 base pairs, and each base has 2 nucleotides. Thus, there are over 9000 bp (18,000 nucleotides) in the BRCA2 gene.
Short-read NGS uses a flow cell to hybridize short pieces of DNA to it, replicate that DNA, and then copy it over and over, sometimes hundreds of times. Each nucleotide is fluorescent and will activate upon reading, allowing that nucleotide to be added to the sequence. Remember Lite-Brites when you were a kid? You’d put little pieces on a black board with holes, and the pieces would subsequently glow. Imagine having 1 Lite-Brite as a template, and trying to copy the same image hundreds of times, and each time you add a piece, it glows a specific color assigned to that light, or in this case, nucleotide. Along the way, you consistently make an error in the exact same spot. Because of the consistency of that light being incorrect, that’s not just a mistake. Instead, that becomes an interesting diagnostic possibility because that patient sample has a mutation.
There are 2 common ways to use short-read NGS: whole-genome sequencing (WGS) and targeted sequencing, also known as amplicon sequencing or panel sequencing. WGS refers to just that: sequence the whole genome at a certain level of coverage, which is how often you read a base pair compared with the reference genome. Most of the time, 30 times coverage, meaning each base pair on average was read 30 times, is sufficient for nondisease applications. Gene panel sequencing looks at a specific subset of known disease genes, at a much greater coverage, up to 1000 times but mostly 500 to 600 times. For example, a provider may want to run a gene panel on a patient with a history of colorectal cancer to determine whether there is a hereditary component. Genes included in this panel would include MSH1, PMS2, MLH2, MSH6, EpCAM, all of which are associated with Lynch syndrome, as well as APC and MUTYH, which are associated with other syndromic patterns where colorectal cancers are common.3 Prostate cancer–specific panels will often include BRCA1, BRCA2, ATM, CHEK2, PALB2, HOXB13, and others.
Long reads act a little differently from short reads. Instead of creating many short copies on a chip, long reads use a very large circular sequence of DNA and continuously run it through a mechanism, such as a protein pore, to consistently read the same DNA over and over. Comparatively, this is like copying a Lego tower repeatedly using the same colors in the same order vs riding a Ferris wheel and having the operator check each cab every time it passes the bottom. Long reads will catch the same errors as short reads, but also provide some structural variant support and help getting through more difficult areas to read. This allows for some deeper understanding of possible disease states and their proximity to other possible issues.
Kary Mullis, PhD, was a chemist at Cetus Corporation. One night while driving around Mendocino County with his girlfriend, also a chemist at Cetus, he recognized that DNA base pairs were constant in their pairing, and had a random thought to match/hybridize short pieces of DNA that were complementary to long pieces of DNA plus DNA polymerase. This matching added nucleotides to a piece of DNA.4 This allowed a short piece of DNA to be amplified repeatedly using different temperatures, creating billions of copies over many cycles, which would then be studied on an agarose gel (Figure 1). Hence, PCR was invented. Mullis was awarded the Nobel Prize in 1993 for this groundbreaking invention, which led to so many discoveries in science. Around the same time, Higuchi et al discovered that increasing amounts of DNA could be directly studied using a fluorescent marker without the need for agarose gel.5 And voilà! qPCR was invented.
In order to use qPCR in diagnostics, the clinic has to know which specific gene is of interest, as the primers have to flank the target sequence to be amplified. The high specificity of this type of assay is both a blessing and a hindrance. It’s a blessing because the clinic can answer a diagnostic question with high confidence but a hindrance because it may miss other possible disease states that are outside the targeted region. Most of the time, a sample will be split up to run multiple different assays at the same time to cover a wider array of diseases. qPCR is also very commonly used to get to the heart of urinary tract infections (UTIs) and their persistent nature. Many companies are offering qPCR diagnostics for UTIs, prostatitis, and more.6 Most of the time, those companies will also offer an NGS panel in addition to cover all diagnostic bases.
Microarrays are small chips with imprinted specific DNA targets of interest (Figure 2). The test is run with a reference sample, often labeled with a green fluorescent dye, and the targeted DNA sample, labeled with a red fluorescent dye. Both are then hybridized to the chip, and a comparative analysis is done. If the targeted DNA is expressed at a higher rate, that small area will glow red. If the control DNA is expressed higher (or decreased expression in the target DNA), it will glow green. Finally, if expressed in equal amounts, essentially no mutation, the square will glow yellow.8 The sensitivity is generally low, but the ability to study many targets at once is a highlight. Microarrays are a common technique of many companies that offer testing for determining cultural heritage. Those companies will study up to 700,000 targets at 1 time. However, microarrays are slowly declining because of the greater adoption of NGS assays. These direct-to-consumer companies are really fun and interesting for those who are seeking information about their ancestry but should not be used for cancer risk assessment.
What happens to a sample in the clinical lab?
When a patient has a sample sent for molecular testing, whether it’s tissue, urine, blood, or saliva, that specimen is immediately tagged with a number specific to that patient. The sample is transported to the lab with the appropriate storage. The lab receives the sample and inputs it into their system, also called accessioning. Then, the molecular journey is as follows:
Nucleic acid isolation: convert RNA to DNA if needed
a. Sonication or enzyme digestion to create uniform DNA segments
b. Enrich the target DNA if needed, as target enrichment and amplicon generation workflows used in gene panels
c. Barcode ligation: also known as multiplexing, which is adding unique markers per patient sample so they can be mixed together, then parsed upon software analysis
d. Adapter ligation: allows the DNA to bind to the flow cell
Sequence the DNA: 0.5 days to several days, depending on sequencing type and instrument used
Bioinformatics: Results are parsed and analyzed.
Report generated: Details around the disease state are provided, and sometimes potential treatment scenarios depending on the software’s FDA approvals
What does a clinical laboratory look for in their tests?
Although you’ve learned about generic methods and workflows, labs look for specific issues using molecular methods that cause various disease states. In this section, I’ll lay out a few of the more common terms in the lexicon of genomic testing.
Single-nucleotide polymorphism (SNP). As the name implies and by definition, this occurs when a single nucleotide change is identified in a particular gene and present in 1% of the population. This SNP results in a gene mutation but may or may not cause an alteration of downstream protein function, depending on whether the change affects the specific amino acid in which it is coding. There is significant interest in looking at a panel of SNPs to determine risk assessment for breast and prostate cancer.
Copy number variation. This is a duplication or deletion of a sequence of nucleotides, not just a single nucleotide like an SNP. Most genes in the human genome have 2 alleles, 1 each inherited from your mother and father. In rare instances, short sequences can be replicated many times. For example, the HTT gene codes for the protein huntingtin. In this case, the trinucleotide CAG can be repeated 36 times or more. The result is abnormal protein production, which can then lead to Huntington disease.7
In other cases, entire genes can be repeated or deleted, causing overexpression or underexpression, as in the case of α-amylase 1 and its overexpression because of dietary differences.8 The largest example of this is the trisomy issues that cause Down syndrome.
Gene fusions. Fusions occurs when 2 genes fuse during replication, causing a pseudogene that creates expression issues. One of the earliest discovered examples of this is a reciprocal translocation where the ABL1 gene of chromosome 9 is translocated and fuses to the BCR gene on chromosome 22, causing a BCR-ABL1 gene (the Philadelphia chromosome), which induces chronic myeloid leukemia.9 This is difficult to detect using molecular testing because there are various fusion loci on each gene, but it can be done with proper techniques, such as digital PCR and NGS.
This partial list of 3 common issues is just a sample of what a molecular lab can discover. Some tests are more involved than others from a workflow and difficulty perspective, whereas others are fairly straightforward. The most challenging part for a lab is to discern the ability of a specific assay type to get the proper answer because some answers are much more difficult to come by.
The world of clinical diagnostics is changing. The development of targeted therapies is increasingly more specific to various molecular changes that are therapeutic resistance drivers. The development of companion diagnostics so patients can receive these new agents is mandatory. In addition, the ability to detect and potentially mitigate disease at a much earlier stage before systemic/metastatic disease has clear upside potential. Therefore, embracing and understanding these new and emerging molecular techniques will improve your patient outcomes and enhance your practice.
Wright has been involved in biotech and clinical sales and marketing for nearly 20 years. He has a Master’s in molecular biology from Washington University in St. Louis and an MBA in strategy and operations from Boston University. He has worked for companies such as Thermo Fisher and Illumina and has started multiple companies outside of the biotech world.
1. Frederick Sanger - biographical. NobelPrize.org. 2005. Accessed April 7, 2021. https://bit.ly/3veQg8w
2. The Human Genome Project. NIH National Human Genome Research Institute. Updated December 22, 2020. Accessed April 7, 2021. https://www.genome.gov/human-genome-project
3. PSM2 gene. MedlinePlus. Updated August 18, 2020. Accessed April 7, 2021. https://medlineplus.gov/genetics/gene/pms2/#conditions
4. Kary B. Mullis – biographical. NobelPrize.org. 2005. Accessed April 7, 2021. https://bit.ly/3hKdNdG
5. Urology – urinary tract infections, prostatitis, & more. MicroGenDX. Accessed April 7, 2021. https://microgendx.com/urology-uti/
6. Microarray. Scitable. Accessed April 7, 2021. https://www.nature.com/scitable/definition/microarray-202/
7. HTT huntingtin [ Homo sapiens (human) ]. NCBI. Updated May 18, 2021. Accessed April 7, 2021. https://bit.ly/3bJUJZe
8. AMY1A amylase alpha 1A [ Homo sapiens (human) ]. NCBI. Updated May 18, 2021. Accessed April 7, 2021. https://bit.ly/3u9rEwG
9. Philadelphia chromosome. Wikipedia. Updated April 4, 2021. Accessed April 7, 2021. https://en.wikipedia.org/wiki/Philadelphia_chromosome https://www.ncbi.nlm.nih.gov/gene/25