Soumya Raychaudhuri, et al., Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions, PLoS Genetics 5(6), 2009
Translating a set of disease regions into insights about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL), that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts.
The GRAIL statistical framework consists of four steps. First, given a set of disease regions we identify the genes overlapping them; for SNPs we use LD (linkage disequilibrium) characteristics to define the region. Second, for each overlapping gene we score all other human genes by their relatedness to it. Third, for each gene we count the number of independent regions with at least one highly related gene. Fourth, for each disease region we select the single most connected gene as the key gene.
The most critical technical difference between GRAIL and other strategies is that it does not use any strict definitions of gene functions or interactions, but rather uses a metric of relatedness that allows for a relatively broad range of freedom with which to connect genes.