Donald F. Conrad, et al., Origins and functional impact of copy number variation in the human genome, Nature vol. 464, 2010
Having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we concluded that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
In this study we use the term CNV to describe collectively all quantitative variation in the genome, including tandem arrays of repeats as well as deletions and duplications.
Genome re-sequencing studies have shown that most bases that vary among genomes reside in CNVs of at least 1 kilo base.
CNVs are generated by diverse mutational mechanisms – including meiotic recombination, homology-directed and non-homologous repair of double-strand breaks, and errors in replication – but the relative contribution of these different mechanisms is not well appreciated.
We identified an average of 1,098 validated CNVs, and a cumulative CNV locus length of 24 Mb (0.78% of the genome) when comparing two genomes by CGH. The 8,599 validated CNVs discovered in these 41 individuals cover a total of 112.7 Mb (3.7%) of the genome.
There was also a bias of CNVs away from enhancers and ultra-conserved elements, but not from promoters or DNaseI hyper-sensitive sites. Indeed duplications seem to be significantly enriched among promoters and stop codons, perhaps corroborating a previous observation of index enrichment at ether end of genes.
Gene ontology analysis showed an enrichment of genes involved in extracellular biological processes such as cell adhesion, recognition and communication in CNVs. However, genes involved in intracellular processes such as biosynthetic and metabolic pathways were underrepresented in CNV regions.
We found the relative contribution of NAHR and VNTR mediated CNV formation to be largely dependent on CNV size. NAHR was estimated to be 7 times more likely than VNTR to be the underlying mechanism for CNVs in the largest size decile, whereas CNTR were 3.5 times more frequent in the bottom decile. Overall, NAHR and VNTR contribute similarly.
We found that duplications are more likely to be formed by NAHR, VNTR and retrotranspositions, and are more enriched for breakpoint-associated sequence motifs than deletions.
Although some sequence motifs (for example, some non-B-DNA structures) were more mutagenic than others, the sequence context was not strongly predictive of the location of CNVs, unlike the link between segmental duplications and larger CNVs mediated by NAHR.
We observed that non-B-DNA forming sequences that are enriched in promoter regions are also enriched in CNV breakpoints, suggesting that the same properties what enable regulation of transcription may also be mildly mutagenic for the formation of CNVs, and as a consequence, CNVs may influence the evolution of gene regulation.