Checking your metrics is an important piece of the sequencing process

High throughput sequencing metrics: Don't be a monster, review them before sending data to the triage team.

Checking your metrics is an important piece of the sequencing process

One of the most important things to avoid when doing high throughput sequencing is 'bias.'

Properly assessing your post-alignment and variant calling metrics is super important for ensuring that biased data is not used to generate a patient report.

Here are some of my favorite metrics to keep an eye on:

Percent High Quality (HQ) Reads Aligned - The percentage of HQ reads that actually align to the reference genome. Here HQ is defined as Q20 or better and this stat should be greater than 98%.

GC Bias Plot - This is one of my favorites and for short-read data it should look like an upside down U with high AT and high GC regions showing slight bias (because of amplification) and for long-read methods this plot is usually flat. Any major deviations can indicate bias either from over amplification or if this is target capture data, bias in the capture process.

Insert size - These plots show the size distribution of the sequenced inserts within your library. These should pretty closely mimic the distribution you see in fragment analysis.

Percent Duplication - This is a measure of the number of perfectly duplicated reads in a dataset. The ideal here is less than 5% and usually if you see problems with read duplication you'll also see issues in the GCbias plot.

Coverage - A measure of the average depth of coverage across the genome or your provided target capture probe set.

Transition/Transversion Ratio (TiTv) - Transitions are A<>G or C<>T (substitutions within the purines and pyrimidines) and Transversions A<>C, A<>T, C<>G, and G<>T (conversion of a purine to a pyrimidine, etc). For genomes the expected TiTv is 2 and for exon capture panels it's 3. Major deviations from these values could indicate a bias during sequencing or sample degradation during storage.

Strand Bias - A measure of the bias of the genotype calls made on the positive and negative strands. No bias means calls are the same on each of the complementary strands, high bias means the calls differ and high strand bias around variant calls could indicate an over-reporting of false positives.

Target capture specific metrics:

Fold80 Penalty - This is a measure of evenness or uniformity. The best captures are ones that have perfect uniformity. Fold80 penalty is defined as "fold over-coverage necessary to raise 80% of bases in targets to the mean coverage level." 1 is perfect, so any deviation from that indicates a bias in capture. The best captures are <1.5.

Percent Reads On Target - This is a measure of how much sequencing is being wasted on non-specific binding or off target. This value can vary greatly depending on the size of your capture from 60-70% for an exome down to <20% for smaller capture panels. Deviations from the expected value can indicate bias.