TelomereHunter: telomere content estimation and characterization from whole genome sequencing data

Lars Feuerbach, et al., TelomereHunter: telomere content estimation and characterization from whole genome sequencing data, bioRxiv, 2016

Here, we present TelomereHunter, a new computational tool for determining telomere content that is specifically designed for matched tumor and control pairs. In contrast to existing tools, TelomereHunter takes alignment information into account and reports the abundance of variant repeats in telomeric sequences.


The criterion of searching for six non-consecutive repeats in 100bp long reads has been proposed previously (Lee, et al., 2014) and was also found suitable for the data presented in the present study.

Estimating telomere length from whole genome sequence data

Zhihao Ding, et al., Estimating telomere length from whole genome sequence data, Nucleic Acids Research 42(9), 2014

Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data.


We defined reads as telomeric if they contained k or more TTAGGG repeats, with a default threshold value of k = 7.


However, in practice with current technology, a typical exome sequencing output contains some fraction (typically 10-50%) of sequence that is off-target, i.e. not exonic. This fraction represents information on the rest of the genome and can be used to estimate relative telomere length by our method.

Variant repeats are interspersed throughout the telomeres and recruit nuclear receptors in ALT cells

Dimitri Conomos, et al., Variant repeats are interspersed throughout the telomeres and recruit nuclear receptors in ALT cells, J. Cell Biol. 199, 2012

Telomeres in cells that use recombination-mediated alternative lengthening of telomeres (ALT) pathway elicit a DNA damage response that is partly independent of telomere length.

Here we used next generation sequencing to analyze the DNA content of ALT telomeres. We discovered that variant repeats were interspersed throughout the telomeres of ALT cells. We found that the C-type (TCAGGG) variant repeat predominated and created a high-affinity binding site for the nuclear receptors COUP-TF2 and TR4.

We used next generation sequencing to quantitatively determine the identity and extent of variant sequence within telomere arrays of WI38-VA13/2RA as well as the telomerase-positive HeLa cell line. Samples were paired-end sequenced, and reads containing greater than six nonconsecutive telomeric repeats in the format TBAGGG were considered to be reads derived from telomeres.

Structure and Variability of Human Chromosome Ends

Titia de Lange, et al., Structure and Variability of Human Chromosome Ends, Molecular and Cellular Biology 10, 1990,

The minimal size of the subtelomeric repeat is 4 kilo bases (kb); it shows a high frequency of restriction fragment length polymorphisms and undergoes extensive de novo methylation in somatic cells. Distal to the subtelomeric repeat, the chromosomes terminate in a long region (up to 14kb) that may be entirely composed of TTAGGG repeats. This terminal segment is unusually variable. Although sperm telomeres are 10 to 14kb long, telomeres in somatic cells are several kilo base pairs shorter and very heterogeneous in length.

Human telomeres contain at least three types of G-rich repeat distributed non-randomly

Robin C. Allshire, et al., Human telomeres contain at least three types of G-rich repeat distributed non-randomly,  Nucleic Acids Research 17, 1989

Human telomeres do not contain a pure uniform 6 base pair repeat unit but that there are at least three types of repeat.

The distribution of each type of repeat appears to be non-random. Each human telomere has a similar arrangement of these repeats relative to the ends of the chromosome.

Analysing the change in length of the telomeric repeat region between an individuals blood and gremlin DNA reveals that this is due to variable amounts of the TTAGGG repeat and not the other repeat types.

There can only be, on average, a maximum of 8.3kb of TTAGGG like repeats per sperm telomere. In addition approximately 1.2kb of TTGGGG like repeats must be distal to all MnlI and HphI site. The most proximal 1.9kb of each telomeric repeat region is made up of TTGGGG and TGAGGG like repeats, this leaves approximately 3.6kb of the human telomeric repeat region unaccounted for.