Don't let highly expressed transcripts get the best of your RNA-seq dataset

Whole transcriptome sequencing AKA RNA-seq: The good, the bad, the dynamic range?

Don't let highly expressed transcripts get the best of your RNA-seq dataset

DNA is the storage form of our genetic information, but transcription is the process in which DNA is converted into the message, the ribonucleic acid (RNA), that codes for proteins!

During transcription, the double helix is unwound and the DNA is bound by a bunch of proteins called transcription factors that recruit an RNA polymerase to begin the conversion process.

The polymerase uses DNA as a template to create the complementary RNA message by attaching RNA bases together.

That's A, C, G, and U!

"Me?"

No, Uracil.

RNA is special, unlike DNA, it doesn't have T or Thymine, it has Uracil which is really just a Thymine that's missing a methyl group.

As the polymerase chugs along it copies the exons (the parts that code for protein) and the introns (the parts that don't code for protein) into a single strand of RNA. The introns are removed during a process called splicing where sequences in the introns and at the ends of exons are recognized by a protein complex called the spliceosome. Ultimately all of the exons are spliced together to create the final RNA message.

This all happens in the nucleus of the cell, but the conversion of that RNA message to functional protein happens in the cytoplasm and on the endoplasmic reticulum (the cabbage-y outer shell of the nucleus!) Here the RNA is bound by another protein complex called the ribosome, and this complex reads the RNA message to create proteins from amino acid building blocks (this is called Translation).

Why did I go through all of the effort to tell you this?

Because, each cell only has two copies of DNA, but that DNA can be turned into thousands of copies of RNA as a result of transcription!

AND all of those RNA messages together make up the transcriptome.

But this also means that RNA is actually a reasonable read out of the biological function of a cell and we can learn even more by capturing those molecules and sequencing them!

RNA-seq can tell us which sequences of DNA actually end up in an RNA message, how much of each message is made, and what isoforms of each RNA message are present (splicing can result in multiple different RNAs being generated depending on which exons are included in the final message).

Unfortunately, since all of the cells in our body express slightly different RNAs depending on the function of a cell, we need to sequence a lot of cells to truly understand the transcriptome!

And double unfortunately, because transcription can create a lot of copies of the same RNA, sometimes very abundant messages, like those for ribosomes or hemoglobin, can dominate the resulting sequencing data causing us to miss messages that may have fewer copies present.

Luckily, we can fix this dynamic range issue using ribo depletion/globin reduction protocols to get rid of these overabundant messages and focus more on the unique RNAs that matter most!