Transcriptomics is better with long-reads
Spoiler Alert: The holy grail of transcriptomics is long-reads.
Transcriptomics is the least talked about of the omics in sequencing because, for whatever reason, DNA gets all the fanfare.
But we are able to function as living organisms because of the activities of DNA, RNA and proteins.
While the genome and DNA are the storage form of our genetic material, the transcriptome is made up of all of the RNA messages derived from the genome.
Now, the funny thing about the information in our DNA is that it's split up into sections called introns and exons.
The introns are removed from the final RNA message during a process called splicing.
On average, an exon is 200bp and on average, a fully spliced human RNA message is 2,000bp.
But RNA is tricky.
That splicing process doesn't always happen the same way in every cell or even in every transcript.
So, cells also contain different versions of these messages, called isoforms, that can contain slightly different combinations of exons.
The inclusion or exclusion of an exon can have a dramatic effect on the function of a protein, so it's kind of important to keep tabs on these isoforms, especially as they relate to disease!
So what happens when you try to do transcriptomics using short-read sequencing?
Well, you first start by turning the RNA into DNA, and then you fragment that DNA so you can sequence it using the traditional short-read process that has a maximum read length of 300bp.
"But Brian, didn't you say that the average RNA is 2000bp? Won't you lose information about what isoforms were present if you chop everything up?"
Fantastic observation!
You'll still pick up the splice junctions and get a rough idea of what exons were connected together, which can help in predicting what isoforms were present.
But long reads are definitely better and you'll capture all of those exons as they exist in a single message without having to guess what was where!
A typical transcriptomics experiment with short reads requires 100-200 million reads.
Since long reads are long, you need about 1/10th as many because you're getting the full length transcript.
Historically, long-read transcriptomics has been pricey, but PacBio has commercialized a transcript arraying method that combines 7-ish sequences into a single read with barcodes separating each in the array.
This more efficiently uses their 16kb reads, effectively reducing their costs and putting them on par with short-read based methods.
Another issue to consider around cost here is that the transcriptome actually represents an amplification of the genome.
Some RNAs are expressed A LOT, and so because you're random sampling from these sequences, you'll pick up the really abundant ones and probably miss the rarer ones.
One way to get around this is to deplete all of those highly abundant messages using enzymatic targeting techniques (CRISPR, RNaseH, etc).
Ultimately this makes your long read dollars go much...longer!