Ryan E. Mills et al., Mapping copy number variation by population-scale genome sequencing, Nature 470 (2011), doi:10.1038/nature09708
We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,0256 deletions and 6,000 additional SVs, including insertions and tandem duplications.
Advances in sequencing technology have enabled applying sequence-based approaches for mapping SVs at a fine scale. These approaches includes: (1) paired-end mapping (or read pair ‘RP’ analysis) based on sequencing and analysis of abnormally mapping pairs of clone ends or high-throughput sequencing fragments; (2) read-depth (‘RD’) analysis, which detects SVs by analyzing the read depth-of-coverage; (3) split-read (‘SR’) analysis, which evaluates gapped sequence alignments for SV detection; and (4) sequence assembly (‘AS’), which enables the fine scale discovery of SVs, including novel (non-reference) sequence insertions. Sequence-based SV discovery approaches have previously been applied to a limited (<20) number of genomes, leaving the fine-scale architecture of most common SVs unknown.
We report here the results of analyses undertaken by the Structural Variation Analysis Group of the 1000GP. The group’s objectives were to discover, assemble, genotype and validate SVs of 50 base pairs (bp) and larger in size, and to assess and compare different sequence-based SV detection approaches. The focus of the group was initially on deletions, a variant class often associated with disease, for which rich control data sets and diverse ascertainment approaches exit. Less focus was placed on insertions and duplications and none on balanced SV forms (such as inversions).