In this new age of genomic epidemiology, so-called “Microbe Hunters” are combining whole genome sequencing technology with epidemiological data. This has allowed major improvements in our understanding of the natural history of pathogens and other drivers of disease. A recent paper published in Cell highlights how this knowledge is being applied to inform our strategies and priorities for global disease control and eradication.
The History of Epidemiology:
In 1854, London faced a severe cholera outbreak. During this time, physician John Snow designed some of the pioneering epidemiological disease-tracking techniques, by combining interview data with disease-event information. Snow concluded that a significant number of cases occurred in those who drew water from the Broad Street water pump. Despite being met with much scepticism from authorities, upon the removal of the pump the outbreak was finally halted.
In 1995, over 150 years later, Haemophilus parainfluenzae, a respiratory pathogen, was the first bacteria to be genomically sequenced. The ensuing advancements in this technology and reduced cost of genomic sequencing has driven a new era for microbe hunting. Whole-genome analysis has now been combined with these historical techniques, allowing unprecedented detail in the understanding of epidemiology.
The utilisation of sequencing technologies are revolutionising diagnostic and outbreak investigation. The ability to understand and recognise different pathogenic subtypes can explain why some people demonstrate more severe symptoms than others, or why some outbreaks are smaller in scale compared to others. Furthermore, mapping genome sequences and highlighting sequence variants allows the measurement of the relatedness among a set of pathogenic genomes. This variant data can then be used to produce phylogenetic trees which provide an overview of evolutionary relationships between different pathogens. Finally, epidemiological variables must also be considered. These variables encompass all known data about a given pathogen. This may include its source (animal, human or environment), pathogen features (such as resistance to antibiotics) and many other factors. Due to recent advancements in inference methods, these phylogenetic trees can simultaneously be analysed with epidemiological variables.
Whole Genome Sequencing:
Whole genome sequencing has greatly advanced our pathogenic subtyping resolution, and this technology has been used on a global scale. For example, in 2010, a devastating cholera outbreak in Haiti led to the loss of 9,000 lives. The outbreak was caused by a serological subtype of cholera, which was present globally, making it hard to track. However, the enhanced resolution of whole genome sequencing revealed that the pathogenic subtype was closely related to known subtypes from South Asia, including Nepal. This eventually led to the conclusion that the cholera outbreak had unintentionally been introduced by Nepalese UN workers. Similar epidemiological studies which cross-reference time, place and person are currently being utilised to determine the origin and trace the spread of SARS-CoV-2 across the world.
Beyond increasing the resolution of subtyping, whole genome sequencing provides wide-spread access to the entire genomes of countless pathogens. Such data allows associations to be drawn between genetic factors and epidemiological patterns. For example, what genetic factors are frequently observed in pathogenic strains that cause more severe phenotypes? Or what mutations occurred before a subtype went on to cause a large disease outbreak?
This technique has been used to understand cases of an Escherichia coli (E. coli). E.coli is a common cause of invasive diseases such as urinary tract and bloodstream infections. Genomic comparisons between normal ‘baseline’ E. coli and a novel subtype, revealed that the new, more infectious strain had developed resistance to fluoroquinolone antibiotics. Thus, treatment was amended and the outbreak was suppressed. Therefore, increasing our knowledge of pathogenic subtypes which cause disease, as well as genetic factors which may influence their emergence and severity is highly important. The usefulness of this technology is constantly increasing, and many research institutions and clinical laboratories are shifting to genomics-based microbiological surveillance as a result.
New technologies have already been harnessed to further improve the resolution of pathogen subtypes. For example, long-read and single-cell sequencing can highlight differences in genomes to high levels of detail, such as the structural genomic arrangement of two pathogens. In comparison, whole genome sequencing simply drafts genomes that represent the consensus of 1000s of individual pathogenic organisms. Such detailed analyses could provide many clinical benefits.
In the long-term, there are arguments for both larger and smaller-scale microbe hunting. For example, in some cases of epidemiological tracing, drug resistance has been found to be transmittable at the plasmid level. Thus, for certain diseases where resistance genes can be acquired by horizontal gene transfer, the unit of surveillance should be the plasmid, or even the gene, rather than the pathogen itself.
On the other hand, wide-scale microbial community sequencing could provide a type of microbiome surveillance. This would allow the inclusion of data from pathogens that cannot be cultured in the laboratory, analyses of the involvement of individual genes (rather than the pathogens that contain them) and other complexities in a large, shared database. Perhaps such a multi-causality model would be the best way forward for genomic epidemiology.