Benjamin Georgi et al., From Mouse to Human: Evolutionary Genomics Analysis of Human Orthologs of Essentials Genes, PLoS Genet 9(5), 2013
Evolutionary and genomic characteristics of human essential genes have never been directly studied on a genome-wide scale. Here we use detailed phenotypic resources available for the mouse and deep genomics sequencing data from human populations to characterize patterns of genetic variation and mutational burden in a set of 2,472 human orthologs of known essential genes in the mouse.
A major challenge in analyzing NGS data for clinical applications is the identification of mutations likely implicated in disease among the hundreds of thousands of harmless variants, and only a few, modestly pored strategies have been described variants disrupting gene function are more likely to have fitness consequences and thus, are more likely pathogenic. A complementary approach to these methods would instead infer sets of genes that are evolutionarily constrained in human populations directly, based on polymorphism data.
It has been shown that human disease genes which are also lethal in the mouse tend to be highly connected in protein-protein interaction networks, and more likely to demonstrate a dominant mode of inheritance than other human disease genes.
In this study, by taking advantage of available sequence data in humans from large-scale sequencing studies, we aim to address two basic questions: are genes identified as ‘essential’ in the mouse also evolutionarily conserved in humans, and second, how is this reflected in their mutational burden and impact on human disease? Our results show strong and consistent signatures of purifying selection within the set of essential genes, including increased sequence conservation, reduced number of exotic missense variants and an overall shift in allele frequency towards rare alleles.
In addition to evolutionary constraint across species, we hypothesized that genes identified as essential in the mouse should also be subject to significant background selection in recent human history. This pressure would be expected to leave a signature of (a) a reduction in overall polymorphism levels, particularly in the levels of missense and loss-of-function mutations, and (b) a skewing of the allele frequency distribution towards increasingly rare variants in EG relative to NLG. Using data from the 1000 Genomes Project Phase 1 release, and after controlling for the total exon length in each gene, we observed a significant reduction in the level of exotic single nucleotide polymorphisms (SNP) in EG relative to NLG or ALL as well as a shift in the distribution of allele frequencies towards rare variants.
Under the model that a subset of mutations in essential genes are subject to purifying selection at a population level, we hypothesized that across the set of essential genes, individual genomes should also exhibit reduced mutational load. When comparing the mutational load in essential genes for each sample in the 1000 Genomes Phase 1 data, we observed a significant reduction in the ratio of non-synonymous to synonymous substitution within EG compared to NLG, as well as an overall reduction in the number of missense variants.
Hypothesizing that mutations in essential genes are more likely to predispose to a neurodevelopment disorder such as ASD, we computed the rate of de novo mutations from these four studies in affected pro bands relative to family-based controls. Considering mutations across coding transcripts and splice sites, and using a gene-based permutation procedure matching total exon lengths and %GC content, we observed an enrichment of de novo mutations in affected individuals at essential genes nor in family-based controls.
The estimate of a total load of ~12 predicted damaging exotic variants per individual in 2,472 human orthologs of essential genes represent the first attempt to directly estimate the individual mutational burden in putative human essential genes at a molecular level.