Low frequency variants are hard to detect, but don't let that stop you from implying you can catch them all

Detecting variants can be challenging, especially detecting the ones that aren't very abundant.

Low frequency variants are hard to detect, but don't let that stop you from implying you can catch them all

Unfortunately, there are a good number of labs that use the default settings for their sequencing analyses, and/or implement premade pipelines that they 'validate' without doing the appropriate amount of work to make sure that the thing they've developed actually performs the way they say it does.

This gets extra tricky in oncology screening where the allele frequencies can drop well below 1% with the latest crop of sequencing based tests advertising sub 0.1% detection capabilities.

But what does it mean to be able to call a heterozygous variant with a 50% frequency in a germline sample or a 0.1% frequency variant in a liquid biopsy?

Can the assay do it every time time?

Can the assay do it in every sequence context?

How do you know?

There are a few really good ways to know, but most places don't do these things because they're not required to:

In silico decimation - this is an informatic technique where the data from a large number of samples is randomly reduced ie take samples with a minimum coverage of 40x at each position and reduce them to 30x, 20x, 10x to see where the assay starts losing the ability to detect specific types of variants. In the case of liquid biopsy samples where the minimum coverage to call a sub 1% variant approaches 10,000x (depending on the quality/error correction strategy), decimation through a much higher coverage range might be warranted. It is also possible to create contrived datasets where variants are randomly inserted into the data at a specific frequency, but these programmatic manipulations of frequencies are only good for evaluating informatics performance, not lab process performance.

Contrived synthetic controls - one way to test process performance is to synthesize a variant into a sequence using a company like Integrated DNA Technologies and then spiking or diluting that sequence fragment into a sample to 'contrive' the mutation. This 'sample' can then be taken through the whole process and be used to determine at what allele frequency the lab process begins to fail to detect the variant (preferably this is done for every gene/target/exon in the panel).

Characterized mixture panels - many companies (and NIST) offer mixture panels. Some are contrived, others are mixtures of cell lines with well known variants, but most importantly, these panels have been independently characterized using sequencing and/or droplet PCR to precisely determine the allele frequencies of the variants contained in the panels. These allow for accurate benchmarking of the performance of an assay using an independent resource.

However, none of these methods is perfect and it's always a good idea for labs to track assay performance post-launch, especially as interesting positive samples are gathered or become available through other sources.