How genomic mapping finally helped to "complete" the human genome, kinda

Did you know the human genome wasn't actually completed until 2022? No, seriously!

How genomic mapping finally helped to "complete" the human genome, kinda

The Human Genome Project declared in 2003 that the human genome was completed, but it had a Dwayne Johnson sized hole in it since it actually only covered ~90% of the genome.

Back when the human genome was sequenced, old school Sanger sequencing was used which had a maximum effective read length of ~800-1500bp.

But even 800bp wasn’t long enough to orient the highly repetitive stretches of DNA found at the tips and the centers of chromosomes.

These regions were not completely resolved until early 2022 using long-read sequencing and genomic mapping (optical mapping and Hi-C).

But this fact raises an important point about the current state of genomics.

The organization of our chromosomes and the locations of genes within them controls how the cells in our bodies use and access that genetic information.

While short-read sequencing is very good at determining the order of those bases, it's not very good at finding the larger structural arrangements of the genome that span thousands (kilo) or millions (mega) of bases.

This means they miss important variants like inversions, translocations, or those that fall in repetitive elements.

So, how can we better understand the longer range organization of the genome?

Historically, this has required us to actually, physically, look at chromosomes with a microscope via karyotyping.

While this works, this process is technically challenging and requires us to catch cells in Metaphase. This is the phase of the cell cycle where chromosomes compact into their classical X structures - like 99% of the time, chromosomes actually look like a giant ball of yarn, which isn’t very useful for viewing purposes

But genomic mapping allows us to get useful information out of those giant balls of yarn with much better resolution than classical imaging techniques.

Optical Genomic Mapping (OGM): Optical maps are generated by restriction enzyme labeling and then threading ultra high molecular weight DNA through what looks like a miniature PLINKO machine. This stretches out and linearizes the DNA. The fluorescent labels are then imaged and aligned with one another to create genomic maps.

Hi-C: A genome wide sequencing technique that captures the conformation of chromosomes. Basically, when chromatin (DNA and structural proteins) is in that ball of yarn, it comes into contact with other parts of itself. These interactions can be 'fixed' with formalin and then detected by sequencing the fragments that were stuck together. Interaction heat maps turn into genomic maps because sequences closer to each other will be 'hotter' since they 'touch' more often.

But, as with every new development, when you look at the genome in a way that you haven't looked at it before, you find things.

And that means we're probably going to hear a lot more about structural variants in the coming years.