Pangenomes: the coolest new thing in human genetics got its start in bacteria. No, seriously!
Bacteria are complicated. We invented pangenomes to better understand them.
If you're not quite sure what a pangenome is, don't worry, it's pretty simple!
It's all of the genes that are found in the genomes of each ‘clade’ of a species.
And clade is just an evolutionary term that identifies genetically distinct populations of individuals.
In humans, those are genetic ancestries but in bacteria we're talking about strains.
For example, when you hear about an E. coli ‘outbreak,’ what usually gets left out of the headlines is that it’s an outbreak of a pathogenic strain of E. coli.
You should know by now that ALL E. coli aren’t bad guys.
In the case of one of these food poisoning outbreaks, we’re usually talking about the O157: H7 strain of E. coli.
But there are other bacteria with strains that share similar opposing personalities.
One of those is Streptococcus agalactiae (or, group B strep) and can be found commonly in the vaginas of healthy women.
However, it can cause significant problems, or even infant mortality during pregnancy and the idea of a pangenome was first proposed in a study of the pathogenic strains of S. agalactiae!
With the falling sequencing costs that came with the invention of massively parallel sequencing, researchers were finally able to obtain full genomes from multiple strains of bacteria and compare them.
What we’ve found is that species of bacteria contain a ‘core’ of essential genes but that the genetic content at the strain level can vary significantly which produces the differences observed in strain level pathogenicity!
We’ve since sequenced a bunch more bacteria and expanded on this concept of a ‘core’ set of genes (found in >95% of genomes) to also include:
Shell genes - Found in 10-95% of genomes, these are genes are also typically shared with more than one strain
Cloud genes - Found in < 10% of genomes, these are strain specific genes
All of these genetic detours can be visualized as a ‘genomic graph’ showing an overlay of each genome and how they differ from one another.
So, why’s it important to characterize all of the genetic variation found in a species?
Because better understanding the strain level genetics can help explain why these bacteria can behave differently in different environments.
But things get extra weird in bacteria because not only do they have their own genome, but they also can have plasmids (extra DNA that can be shared).
They can also be infected by ‘bacteriophages’ (DNA viruses) which can change the genetics AND the behavior of these bacteria!
That’s in addition to the mutations that occur to their genomes as they grow and replicate.
Things get only moderately less complicated as we transition the concept of the pangenome to other organisms like humans.
We may not have plasmids that can do goofy things with our genetics, but viruses and environmental pressures have helped shape our genetic ancestry in ways we’re only just beginning to appreciate!