Satellite DNA and related diseases

Satellite DNA, also known as tandemly repeated DNA, consists of clusters of repeated sequences and represents a diverse class of highly repetitive elements. Satellite DNA can be divided into several classes according to the size of an individual repeat: microsatellites, minisatellites, midisatellites, and macrosatellites. Originally considered as «junk» DNA, satellite DNA has more recently been reconsidered as having various functions. Moreover, due to the repetitive nature of the composing elements, their presence in the genome is associated with high frequency mutations, epigenetic changes and modifications in gene expression patterns, with a potential to lead to human disease. Therefore, the satellite DNA study will be beneficial for developing a treatment of satelliterelated diseases, such as FSHD, neurological, developmental disorders and cancers.

The non-coding DNA -is it «junk» or functional?Only a tiny percentage of human DNA is coding for proteins, whereas the non-coding DNA, transposons and transposon-derived elements make up the majority of the genome [1].This fact was first discovered already in the 70's of the last century.Later, the efforts of the Human Genome Project resulted in the estimation of the percentage of non-coding DNA of the human genome as 98-99 %.
This somewhat surprising fact led Susumi Ohno to suggest the notion of «junk DNA», referring to the idea that most genomic DNA has no use for the organism [2].According to Richard Dawkins, one of the leaders of the pro-«Junk» concept, «the greater part of the genome might as well not be there, for all the difference it makes» [3].This claim found support in recent work showing that the megabase sized deletion of non-coding sequences in mice has no phenotypic effect [4].There were, indeed, good reasons (based on the mathematical population genetics), to expect that most of the sequences in a typical eukaryotic genome (except only about 30000 loci, according to Ohno) could not be under se-lection, and thus could not be important.The concept of «junk DNA» was also supported by the so-called C-value enigma, according to which the genome size does not correlate with the complexity level of the organism [5] -e.g., humans have genomes much smaller than those of some single cell eukaryotes.
However, with the completion of the Human Genome Project and the launching of the ENCODE (The Encyclopedia of DNA Elements) project by the US National Human Genome Research Institute (NHGRI), the prevailing view that portrays most of the human genome as having no function whatsoever, has become less and less sustainable.
The ENCODE project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification.These data allowed researchers to assign the biochemical functions for 80 % of the human genome, in particular to noncoding regions, highlighting an evidence of the functionality of non-protein-coding DNA.One of the demonstrations is the transcription of most non-proteincoding DNA into RNA [6].Only 4 % of the 65000 RNAs produced by genome are coming from exons [7].It would not make sense for the organism to waste pre-cious resources transcribing such DNA, if it were not functional.Moreover, hundreds of non-coding regions of DNA have been recently found to be 100 % conserved between humans and mice (called ultra-conserved elements, UCEs) [8].The perfect conservation of these regions makes an argument for their evolutionary importance.
The changes in our views on the functional importance of non-coding DNA also concern the «poster child» of the junk DNA concept -the so called satellite DNA, the main subject of this review.
Satellite DNA in genome organization.In humans, recent estimations based on the results of the Human Genome project suggest that about 70 % of the genome is represented by repetitive and repeat-derived DNA elements [21].
The repeated sequences can be divided into two categories (Fig. 1): -The «interspersed repeated sequences», when the repeating copies are dispersed over the genome.
-The «tandem repeated sequences» (also called satellite DNA), when the repeating copies are adjacent to each other.
Satellite DNA is a diverse class of highly repetitive units, accounting for approximately 10-15 % of all repetitive DNA sequences in the human genome [22].The size of one tandem repeat unit can vary dramatically, i. e., from one base pair (bp) to several hundred bps.According to the orientation of the repeat units in tandem, two types of repeats can be distinguished: i) direct repeats, which are head-to-tail orientated and ii) inverted repeats, which are head-to-head orientated.However, only direct repeats are frequently found in genomes, whereas the inverted repeats are rare, presumably because they can induce double strand breaks [23].
First thought as an artefact, satellite DNA was discovered in 1961, as additional peaks appeared during fractionation of DNA according to floating density in gradient of caesium salts [24,25].Ten years later, the enrichment of highly repetitive satellite DNA in the constitutive heterochromatin regions was described [26].Satellite DNA can be located in different parts of chromosomes with the preference in telomeric and centromeric regions.The examples of satellite DNA are the telomeres, the centromeres and the ribosomal genes.Despite numerous studies, the functional significance of satellite DNA is still poorly understood.
Classification of satellite DNA.Satellite DNA is divided into several categories according to size, structure and localization.However, a universally accepted classification does not exist so far, and the scheme can vary among authors [23,27,28].According to the size of an individual repeat, satellite DNA can be divided in-RICH J. ET AL.  2) minisatellites (10-60 bp per repeat), (3) satellites (up to hundreds of bp per repeat) and (4) macrosatellites (several kb per repeat).To avoid potential confusion, we propose naming the third class of satellite repeats as midisatellites, leaving the term of «satellite DNA» to encompass all tandemly repeated DNA.

Genome
The most abundant human satellites are the simplest repeats (microsatellites), which are also called simple sequence repeats (SSR), or short tandem repeats (STR) in the literature.These repeats are widely spread in plant and animal genomes; for example, there is, on average, one microsatellite every 2 kb in the human genome [29].The microsatellites are frequently involved in neurodegenerative diseases caused by the strong expansion of SSR in one gene [30].One of their main characteristics is a high rate of instability manifested in a loss or gain of repeat units, which is explained by i) the non-homologous recombination between homologous chromosomes and ii) the replication errors [31].The microsatellites have been shown to emerge as ubiquitous genetic markers for many eukaryotic genomes [32][33][34].This could be useful in the evolutionary studies regarding the genetic relationships.
The minisatellites exhibit some similarities with the sequence of lambda-phage (GCTGTGG) and the majority of them are GC-rich with a strong strand asymmetry.Among others, this class of satellite tandem DNA contains the hypermutable repeats, which are flanked by DSB (double-strand break) hot spots and show a high rate of meiotic instability.Because of their length polymorphism, the minisatellites are used for DNA fingerprinting in forensic science for the identification of individuals by their respective DNA profiles.In addition, they have been proposed to serve as markers for genotoxicity [35].
Very little is known about the midisatellite non-coding DNA, which is mostly located in the telomeric or centromeric regions of chromosomes.This kind of satellite DNA is represented by a, b and g-satellite repeats.Alpha-satellite repeats (ASR), composed of a tandem array of a 171 bp repeat unit, are located at the centromeres of all human chromosomes.ASR play a crucial role in the chromosome segregation of normal and human artificial chromosomes (HACs) [36].The betasatellite DNA was described by Waye and Willard, and is presented by tandem arrays of divergent Sau3A 67-69 pb monomer repeat units [37].Beta satellite repeats (BSR) show a predominant heterochromatic distribution, which includes the short arm of acrocentric chromosomes and the pericentomeric part of chromosomes 1, 3 and 9 [38,39].In these regions, beta satellite arrays are adjacent to LSau 3.3 kb macrosatellites (D4Z4-like repeats).The D4Z4 macrosatellite repeats and BSR are also located in the subtelomeric regions of chromosomes 4 and 10 [40,41].Recently, the presence of BSR next to a newly created telomere has been shown to retard its replication timing [42].The DNA of gammasatellite repeats (GSR), a tandem array of 220-bp GCrich repeating units, was identified in the pericentromeric regions of human chromosomes 8, X, and Y. GSR usually form 10-200 kb clusters, flanked by alpha-satellite DNA.It has been proposed that the primary role of the gamma-satellite DNA is to prevent the pericentromeric genes from epigenetic silencing [43].
The macrosatellite repeat (MSR) DNA is the only satellite DNA that could contain an open reading frame (ORF) and thus, could produce protein-coding RNA in every repeat unit, as illustrated by the D4Z4 (DUX4), CT47, RS447, TAF11-Like and PRR20 repeats.In comparison to the telomeric or centromeric satellites, which could be present on many chromosomes, MSR are often specific for only one or two chromosomes.Additionally, MSR are mostly expressed in the germ lines, with the expression being affected by the methylation status of DNA.Moreover, MSR could be associated with both transcriptionally active and silenced chromatin [28,44].Other non-coding MSR, such as DXZ4, have been shown to play a mechanical role, and are involved in the formation of MSR chromatin topology [45].The role of MSR is not well known.Therefore, D4Z4 is probably one of the most studied macrosatellites because of its association with facioscapulohumeral muscular dystrophy (FSHD).
Clinical relevance of satellite DNA.In recent years, much attention has been brought to the role of repeat sequences in various pathologies, such as epilepsy, embryonic lethality and cancers [46,47].The increase in the number of copies of repeats in genomic DNA is the single most important cause of nearly 30 hereditary disorders [48]: the X-fragile syndrome [49], the myotonic dystrophy [50], the Huntington's disease [51], the Friedreich's ataxia [52], etc.Scott et al. have described the first evidence of the insertion of 18 beta-satellite units in to the gene coding transmembraine serine protease resulting in autosomal recessive deafness [53].
Most of these diseases are due to the microsatellites, whose structural features often result in the disruption of DNA replication, repair and recombination processes, leading to expanded or contracted DNA structures.The microsatellite expansion has been implicated in serious myopathies, as well as in neuromuscular and neurodegenerative hereditary diseases.The majority of the disorders are caused by the expansion of the triplet repeats (CGG)*(CCG), (CAG)*(CTG), (GAA)*(TTC) and (GCN)*(NGC).Nevertheless, diseases can also result from the expansion of a tetranucleotide, a pentanucleotide and even a dodecanucleotide repeat [48].
The microsatellite expansion diseases can result in a gain or/and loss of function.Among disorders caused by gain-of-function mechanism, the main cause is the protein conformation alteration, leading to changes in protein activity or abundance.As an example, the polyglutamine diseases, one of the nine classes of gain-offunction disorders, are due to the expansion of CAG repeat.These include Huntington disease (HD), Kennedy disease and Spinocerebellar ataxia (SCA) [54,55].Affected individuals develop nuclear inclusion bodies containing aggregated proteins with expanded polyglutamine stretches.
Alternatively, the repression of the gene transcription, caused by the microsatellite expansion, leads to the loss-of-function mechanism.For example, fragile X mental retardation 1 gene (FMR1) is associated with (CGG) n satellite expansion, resulting in the transcription silencing and the loss of FMRP, the protein product of FMR1 gene [56].Other examples of loss of protein expression are Jacobsen syndrome [57] and Friedreich's ataxia [58].The contributions of the microsatellite expansion could be manifested by both gain and loss of function mechanisms as it has been demonstrated for spinocerebellar ataxia type 1 (SCA1) [59].
Intriguingly, for the microsatellite-expansion related pathologies, a correlation has been reported between the size of the repeat tandem and the severity of the disease [60].This correlation is behind the phenomenon of genetic anticipation, according to which the progressive increase in the repeat number (to be inherited in subsequent generations) results in the increased severity and earlier manifestation of disease [61].Importantly, the expandable microsatellite repeats responsible for a particular disease are usually located within the affected gene, in either the coding (ORF) or else the noncoding (promoters, introns and UTRs) regions of it [47].
In the cases other than microsatellites, the exact molecular mechanisms of how satellite DNA can be involved in pathology remain largely unexplored, particularly when satellite DNA is not located in the coding region of the affected gene.Among the unresolved issues are: (i) the relative role of satellite DNA presence and (ii) the contribution of genetic and epigenetic factors, i. e., whether structural modifications (tandem size, nucleotide sequence etc.) of satellite DNA provide the starting point, or the epigenetic modifications play the primary role in the mechanism of pathology.The slow progress in this matter could be explained by the methodological difficulties in studying large repeating DNA sequences, as well as by the lack of animal models that could replicate the symptoms of the hereditary satellite-related diseases in humans.
Cancer as a satellite-related disease.«Breaking of satellites' silence» is now a new paradigm in cancerogenesis [62].While gene-specific loci can be either hypoor hypermethylated in cancer, the highly repeated DNA sequences are only hypomethylated in this disease.Moreover, the global DNA hypomethylation observed so frequently in cancers is mostly due to satellite DNA.Ehrlich et al. were the first to demonstrate the hypomethylation of tandem minisatellite centromeric DNA (namely centromere-adjacent satellite 2, (Sat2)) in breast adenocarcinomas [63], ovarian epithelial tumors [64] and Wilms tumors [65].A highly significant difference in the methylation level was found in satellite repeats (SATR1 and ARLa) in neurosarcoma [66].In 76 % of hepatocellular carcinoma cases, the reduction in Sat2 methylation has also been demonstrated [67].More recently, Zhu et al. showed that the loss of the BRCA1 tumour suppressor gene provoked satellite-DNA derepression in breast and ovarian tumours of mice and humans [68].Satellite DNA hypomethylation has been postulated as the mechanism that underlies the induction of peri-centromeric instability in many human cancers [69].All together, these results suggest satellite repeats to be the main target for hypomethylation.
Recently, a 40 fold-increase of the ASR tandem sequence expression has been observed for cancers of lung, kidney, ovary, colon, and prostate.More impressively, a 131-fold increase of the pericentromeric satellite Sat2 transcripts has been demonstrated in pancreatic tumors in comparison with normal pancreases [70].Interestingly, other satellites, such as BSR, GSATII and TARI were not affected.
However, some tandem repeats, NBL2 and D4Z4, could be hypomethylated in some cancers and hypermethylated in others, which could be explained by de novo methylation processes [71,72].
There is no data in the literature concerning the potential mechanisms for the selective demethylation of satellite DNA in cancer.Nevertheless, this phenomenon and the overexpression of satellite transcripts in this disease could potentially be useful as a biomarker for cancer detection and for the evaluation of the efficacy of anti-cancer treatment.In addition, epigenetic cancer therapy directed against global methylation changes needs to be reconsidered towards the targeting of the only affected DNA, in order to avoid side effects.The knowledge of what satellite DNA is implicated in particular cancer will be important in cancer study.
One can expect that the search for links between satellite DNA, tumor suppressors and genomic instability promises new routes for cancer therapy.However, the molecular connections between the satellite hypomethylation and the cancer development remain obscure.The opening of chromatin at satellite repeat regions resulting in the mislocalization of the transcription machinery, could lead to the aberrations in gene regulation with pathological consequences [73].Additional mechanism explaining the changes in transcriptional regulation may be achieved through the remodelling and looping of chromatin [74,75], as described in our discussion of FSHD below, or through the involvement of long-noncoding RNA (lncRNA) [76].It has been shown that repeating sequences can massively produce lncRNA, which can further contribute to the epigenetic changes leading to pathology [70,77,78].In every type of cancer analyzed so far, at least 200 lncRNA have been found to be affected [79].Among those lncRNA, some important validated candidates for prognostic indicators and/or functionally relevant universal «drivers» and «suppressors» of drug-resistant metastasis have been identified [80].One can expect that these figures will increase in the near future, thanks to the efforts of the ENCODE project in decrypting 98 % of non-proteincoding DNA of the human genome, including satellite DNA, which aim is to characterize the non-coding transcriptional landscape and to analyze the possible expression of extremely short peptides encoded by lncRNA [81].
The role of satellite DNA in cancer illustrates again that, whereas it was first introduced as «junk» DNA, in fact it plays an important role in genome functioning.Moreover, there are suggestions in the literature that the purifying selection, responsible for the conservation of the DNA satellite sequences in evolution, is due to the oncological consequences of the mutations in satellite DNA [82].
Facioscapulohumeral muscular dystrophy as a satellite-related disease.Another example of clinical relevance of satellite DNA is the FSHD disease, which is associated with the macrosatellite (D4Z4) and beta-satellite (4qA allele) DNA sequences.It is an autosomal dominant muscular dystrophy ranking second after Duchenne muscular dystrophy, with an incidence of 1: 14,000 throughout the world (2010, The FSH Society).FSHD is clinically characterized by a progressive muscular weakness in an up-to-down manner involving face, pectoral girdle, upper limbs, lower limbs and hips [83,84].
The FSHD locus was mapped in the 1990s by linkage analysis in the subtelomeric region of the long arm of chromosome 4 (4q35) [85][86][87].The causal molecular anomaly is the contraction of the macrosatellite tandem size consisting of D4Z4 repeats.In the normal population, the number of copies is polymorphic and varies from 11 to 150.In 90-95 % of the FSHD patients, this number is decreased to 1-10 units [88].The lower the repeat number, the younger the age of the beginning of the disease and the higher the severity of the disease [89].The presence of at least one D4Z4 repeat is necessary to develop the disease, as patients with 4qter deletion have no FSHD symptoms [90].
D4Z4 is a macrosatellite repeat of 3303 bp length (Fig. 2).Each unit contains LSau and hhspm3 interspersed repeats and an open reading frame of 1173 bp, named DUX4, which contains two homeobox domains [91,92].LSau is a Long Sau3A-repeated element of 340 bp, whereas hhspm3 (a human DNA insert showing sperm-specific hypomethylation Sp-0.3-8) is a GC-rich low copy repeat sequence of 467 bp.The open reading frame has neither introns nor a polyadenylation site in the D4Z4 repeat unit and is preceded by a promoter (box TACAA) located 149 bp upstream [93].DBE (D4Z4 Binding Element), located 181 bp upstream of DUX4, is a binding site for the YY1 transcriptional repressor, HMGB2 (High-Mobility Group Protein B2) and nucleolin and was shown to be responsible for epigenetic repression [94,95].In addition, the presence of a strong enhancer was demonstrated in each D4Z4 repeat [96].Both the transcriptional repressor DBE and the transcriptional activator were shown to mediate the transcriptional control of 4q35 genes.
The 4qter and 10qter regions possess 99 % of homology, which extends over more than 200 kb [97].The 10qter located repeats can be distinguished by the presence of a Blnl restriction site [97].There is a repeat exchange between 10qter and 4qter regions in 20 to 30 % of the population [98], which creates 4q/10q hybrid sequences.FSHD is specific to chromosome 4 because the contraction in 10q26 is not associated with the disease.
The second satellite DNA sequence associated with FSHD is a BSR tandem of 6.2 kb length (4qA allele) in 4q35 locus downstream of D4Z4 [40].The 4qA allele is present in about half of a normal population.In contrast to the 4qA, the 4qB allele does not contain the BSR tandem and the reduction of D4Z4 MSR tandem does not result in the manifestation of the disease.Evolutionary analysis by Van Geel et al. indicates that the 4qA allele is older than the 4qB [99].Moreover, evolution-wise, FSHD appears as a very young disease, present only in humans, as the linked D4Z4-BSR clusters from the FSHD region were found only in chimpanzees, and even this primate has never been found to suffer from FSHD [100].
The FSHD-like, or FSHD2, type of the disease represents 5 % of FSHD and is characterised by the high frequency of sporadic cases (70 %) and the absence of macrosatellite contraction in 4q.Nevertheless, the presence of the BSR 4qA allele remains a necessary condition for FSHD2 development [101].Recently, we have demonstrated that the BSR fragment from the 4qA allele possesses the properties of a transcriptional activator [102].These observations strongly suggest the significance of the BSR presence in FSHD development.
Macrosatellites can provoke epigenetic changes.Chromatin.First studies showed that the D4Z4 macrosatellite has in most cases the features of «unexpressed euchromatin» [103].Some specific changes in post-translational modifications (PTM) of histones in FSHD were described [104].In healthy individuals, D4Z4 has both the heterochromatic (trimethylation of H3K9 and H3K27) and euchromatic marks (dimethylation of H3K4 and acetylation of H3).The FSHD patients present a loss of H3K9 trimethylation in both chromosomes 4q and 10q.
Later on, Ottavini et al. showed that the D4Z4 macrosatellite repeat acts as a CTCF insulator protecting the adjacent genes, activated in pathology, from their heterochromatization in FSHD patients (Fig. 2) [105].The loss of this feature was observed with the increase of the number of D4Z4 repeat units.This is an example of how a change in tandem repeat copy number (i) leads to a switch of its function from repressor to insulator and (ii) provokes dramatic downstream biological effects.
DNA methylation.The significant hypomethylation of D4Z4 CpG dinucleotides was observed in FSHD1 patients compared with the healthy individuals [106].FSHD1 patients show hypomethylation in the contracted D4Z4 allele, with the methylation degree depending on the number of repeats.In particular, the hypomethylation is more significant in the patients with short D4Z4 tandem (3-6 repeat units) than in the patients with moderate size of MSR (6-9 repeat units).This observation suggests a correlation between the severity of disease, the number of D4Z4 repeats and their methylation level [107].On the other hand, hypomethylation might contribute to the disease independent of contraction, as the FSHD2 patients without a decrease in the number of D4Z4 repeat units also present a strong hypomethylation of the satellite on chromosomes 4q and 10q [106,108].In both cases (FSHD1 and FSHD2), the hypomethylation takes place in the macrosatellite region (D4Z4) and does not extend to the adjacent region towards centromere.Nothing is known about the methylation status of the adjacent region extending towards telomere.
The D4Z4 hypomethylation may provide the missing link between DNA changes and transcriptional derepression in FSHD.Whatever the exact nature of this mechanism is, the D4Z4 hypomethylation in FSHD individuals strongly supports a central role of epigenetic events in the pathogenic pathway of FSHD.
Satellite DNA: therapeutic targets?Several publications have highlighted the existence of crosstalk between the MSR DUX4 expression and the differentiation state of cells.The MSR gene is expressed in germ line and embryonic cells and is epigenetically repressed in somatic cells [44,109].In FSHD patients, it has been proposed that the DUX4 transcript is transiently expressed at a pre-myoblast stage in the affected regions of skeletal muscles and possibly among certain subsets of muscle satellite cells.This could result (i) in the upregulation of the expression of adjacent genes and (ii) in the dampening of the expression of many muscle lineage-associated genes, during regenerative myogenesis.This, in turn, could decrease the efficiency of the late stages of regenerative myogenesis or affect the muscle function in a manner consistent with the usually slow progression of FSHD.
Recently, another mechanism of the MSR DUX4 expression has been proposed.It consists in the expression of a DUX4 full length stable RNA, which is transcribed from the last unit of D4Z4, which includes the 3'UTR region containing two facultative introns and a Fig. 3. Architecture of 4q35 FSHD locus.The reduction of macrosatellite tandem provokes the drastic changes in chromatin organization.The 3C analysis of chromatin organization of control (left) and FSHD (right) myoblasts [102] polyadenylation site.The canonical polyadenylation site is only present in the pLAM sequence associated with BSR of the 4qA allele [41,93].This long DUX4 transcript was found in half of examining biopsies samples from FSHD patients.Additionally, the high level of this long DUX4 transcript was observed in special pools of muscle cells representing 0.1 % of FSHD culturing muscles [110].According to these two models of FSHD development, DUX4 expression is considered as the main therapeutic target for FSHD.
Further on, Dixit and collaborators reported that, in the muscles of FSHD patients, the transcription of DUX4 was associated with an increase of a homeodomain transcription factor PITX1 [93].They showed that the DUX4 protein can act as a transcriptional activator of PITX1.Another study showed the existence of partial transcripts of small RNAs (mi/siRNA) resulting from DUX4 [109].These findings provide additional therapeutic targets for FSHD.
More recently, however, the DUX4-centric paradigm of FSHD development has been challenged.The inappropriate expression of the DUX4 stable transcript that takes place in FSHD pathogenesis was observed in FSHD fibroblasts [110], which are not affected in FSHD, as well as in muscles and myogenic cells of some unaffected individuals without D4Z4 deletion [111].These somewhat controversial observations suggest that the role of DUX4 expression in FSHD and its potential role as a therapeutic target require a more thorough exploration.
As far as the second satellite repeat (BSR) actor in the FSHD development is concerned, little is known about the functional role of the 4qA allele.Previously, we have shown that a 1.5 kbp fragment from 4qA allele contains a transcriptional activator [102], allowing us (i) to suggest that the presence of BSR could influence the transcriptional activation of adjacent or juxtaposed sequences and (ii) to propose that the protein activator of 4qA would be a new therapeutic target for FSHD.
Altogether, the recent progress in the studies of FSHD, including our own data, suggests that the specific satellite DNA, macrosatellite (D4Z4) and beta-satellite (4qA) sequences are essential for the development of this disease and could be considered as potential targets for the gene therapy in FSHD treatment.
Conclusions.Much experimental evidence reviewed in this manuscript, indicates that the study of satelli-te DNA promises to provide important insights into human diseases, as they are related to transcription control and often have an impact on chromatin structure and genomic instability [112].Despite many technical challenges due to their repeated nature, the manipulation of satellite DNA might have great therapeutic potential in the treatment of FSHD, neurological, developmental disorders and cancers.These arguments warrant further investigations on various mechanisms regarding satellite DNA functions in the genome.
Acknowledgements.We thank Mr. R. Willett (CNRS) for the copy editing.