Everything you ever wanted to know about cluster based and single-molecule sequencing

Clusters vs single molecules in DNA sequencing: Here’s the short and the long of it.

Everything you ever wanted to know about cluster based and single-molecule sequencing
🗞️
This post originally appeared in the Omic.ly Premium 37 newsletter. To get Omic.ly Premium in your inbox every Sunday, subscribe to the Premium tier or higher.

There are 6 short-read companies and only 2 long-read companies.

Why is that?

Well, short-reads are cheaper so the market is bigger, and long-reads are really hard.

Short-read sequencing uses some version of random clustering, bead emulsion clustering, or whatever other slightly different iteration exists for patent purposes, but suffice it to say, cluster based sequencing is really only good out to about 400 base pairs.

This is due to phasing!

If you’ve done short-read sequencing, you know that phasing is an important metric, but you might not know why.

Basically, what happens when you short-read sequence is you start a little race at the beginning of each cycle.

And you do this with an amplified 'cluster' of many sequences that are clones from the same fragment so that you can efficiently detect the signal at each base!

The only problem is this race isn’t with people, it’s with cats. And cats don’t really care about your finish line so some get way ahead (Pre-phasing), some fall behind (phasing) and some all cross at the same time.

Luckily, the sequencing cats are nice and only a fraction of a percent of them misbehave every cycle – but if a fraction of a percent of them are lost every cycle, that limits how many cycles can be done, and you guessed it, that’s a couple hundred bases.

Very smart people have created algorithms to correct for this and have been able to push read lengths out to about 400 base pairs.

That’s why long-read sequencers don't use clustering but instead use very clever ‘single molecule’ schemes to generate data!

PacBio does this by watching an immobilized polymerase as fluorescently labeled bases are added to a single molecule. They use a powerful microscope and confocal techniques to view only the light emitted at the polymerase.

Alternatively, Oxford Nanopore Technologies (ONT) uses nanopores, and they sense the changes in current created as bases pass through the pore. Nanopore sequencing actually works by predicting the sequence content of a set of ~6 bases, a Kmer, from a single molecule.

But there's always an exception to the rule and there's one short-read company that dared to sequence single molecules.

That's Helicos Biosciences and since you've probably never heard of them you can guess how well this worked out.

They had a short stint in the spotlight before being eclipsed by Illumina in 2010.

However, their technology was resurrected in 2016 by SeqLL.

They've enjoyed about as much success as Helicos and their technology uses total internal reflection fluorescence (TIRF) microscopy to amplify and image the fluorescence from single molecules. While this sounds cool, they are limited to sub 60 base pair read lengths and don't have a broad user base.

ONT has also made noise about being able to do short-reads, but they're a very spendy option when it comes to short-read applications!


Omic.ly Premium 37
A weekly email newsletter on omics and clinical laboratory diagnostics.