Transcription factor CTCF and mammalian genome organization

The CTCF transcription factor is thought to be one of the main participants in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains, regulation of imprinting etc. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on CTCF functioning within a framework of the chromatin loop domain hypothesis of large-scale regulation of the genome activity. Its fundamental properties allow CTCF to serve as a transcription factor, an insulator protein and a dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s).

Introduction.CCCTC binding factor (CTCF) is an evolutionally conserved transcription factor of vertebrates.It binds to different functional elements of the genome and performs various regulatory functions (see recent reviews [1][2][3]).
The transcription factor CTCF was first identified as a protein able to recognize the sequence of three direct CCCTC repeats in the regulatory region of chicken MYC gene [4,5].Concurrently, the NeP1 protein binding to F1 element of the chicken lysozyme silencer was described [6].NeP1 and CTCF were later found to be the same protein [7].
Human CTCF polypeptide chain consists of 727 a. a.The central DNA-binding domain contains eleven zinc fingers, which are flanked by lysine-and arginine-rich positively charged N-and C-terminal domains of 267 and 150 a. a. [12].Following DNA-binding domain there is a glycine-rich motif typical for both ATP-and GTPbinding proteins.The nuclear localization signal is located closer to the C-terminus of the polypeptide, the phosphorylation sites are next to it [9,11].
The variety of functions of CTCF protein presumes that there is the process of regulation of the CTCF activity, performed, inter alia, via its posttranslational modifications.Noteworthy are such modifications as the phosphorylation by casein kinase 2 [13], ADP-ribosylation [14], and sumoylation [15][16][17].
In the course of evolution, CTCF appears in bilateral metazoa and is absent in plants and protozoa [18].CTCF is constitutively expressed and required for the functioning of many vertebrate cell types in the multicellular organism.It is likely that CTCF is not required for the growth of mammalian cells in culture [2].
However, the significance of CTCF protein for the development of vertebrates is evidenced by the fact that murine embryos, homozygous for the knocked out CTCF gene, died prior to the implantation [19,20].The knock-out of CTCF gene in murine oocytes prevented the normal development of the blastocyst after the fertilization [21].
The functional properties of CTCF.CTCF binding site.The development of such technologies as ChIPchip (chromatin immunoprecipitation on chip) and ChIPseq (chromatin immunoprecipitation-sequencing) [22][23][24] allowed obtaining a number of consensus DNA sequences that CTCF binds preferentially [22,23,25].It was revealed that both the amino acid sequence of CTCF protein and the nucleotide sequence of its binding site are highly conserved in different kinds of vertebrates.
Later the ChIP-seq procedure was expanded with the exonuclease treatment (ChIP-exo) [26], which secured more precise localization of CTCF binding sites in the genome.It was demonstrated that CTCF binding site may be subdivided into four blocks, each having its own consensus sequence.About half of CTCF binding sites contained only two central blocks.Other CTCF binding sites contained either all the four blocks or the combination of two or three blocks, or just one block.Further studies led to the notion of the CTCF core binding motif [27], which is a base of the majority of binding sites (Fig. 1).
CTCF binding sequences always include a small number of sequences without any detectable consensus [26,27].These sequences are assumed to bind CTCF via intermediate proteins rather than directly [27].Experimentally detected CTCF binding sequences are collected in the CTCF Binding Sites Database [28].
The interaction of CTCF and DNA.The electrophoretic mobility shift assay (EMSA) was used to study the interaction of CTCF zinc finger domain fragments and some known CTCF binding sequences, including the site from chicken beta-globin locus [8], sites from murine Igf2/H19 locus [29,30] and site APBbeta, located in the promoter of human beta-amyloid precursor [31].
Since zinc fingers are a characteristic DNA-binding structure of many nuclear proteins [32], it was assumed that the ZnF-containing domain is responsible for the CTCF-DNA interaction.It was demonstrated that four zinc fingers (from 4 to 7) are the minimal set required for the specific interaction of CTCF and its binding site in vitro.Further reduction in the number of zinc fingers resulted in a sharp decrease in the DNA-binding ability of the CTCF fragment [30].
CTCF may use different combinations of zinc fingers for binding to DNA.In particular, the binding to the regulatory site of chicken MYC gene is effected by zinc fingers from 2 to 7, while the site located close to P2 promoter of human gene MYC is bound by CTCF via zinc fingers from 3 to 11 [11].The binding of chicken lysozyme silencer fragment F1 in vitro requires zinc fingers from 5 to 8 [7].
Later these data were confirmed and expanded using mass sequencing [27].It was demonstrated that the zinc finger eight is not required for the specific recognition of the binding site by CTCF and rather stabilizes the CTCF-DNA complex via unspecific interaction.Zinc fingers 9-11 are responsible for the interaction with a small region of CTCF binding sequence separated by the spacer from the core unit.This fragment, called a Uelement (upstream element), corresponds to the block 1 identified in [26].It is noteworthy that the removal of zinc fingers 8-11 leads to the complete lack of the CTCF interaction with the binding site containing the U-element [27].It is likely that when a core sequence is far from the consensus, a DNA-protein complex is stable only in case of additional interaction of CTCF and the U-element.Zinc finger 3 is involved in the interaction with DNA in the absence of the U-element, while zinc fingers 1 and 2 are likely to not participate in the specific interaction with DNA, but are responsible for the overall stabilization of the DNA-protein complex.
The hypothetical mechanisms of the loop formation (Fig. 2), via which CTCF may participate in the formation of the chromatin domains and function as a protein component of insulators are suggested [33,34].
The impact of DNA CpG methylation on the CTCF binding.It was demonstrated that CpG methylation of the CTCF binding sites leads to the suppression of their binding to CTCF protein in vitro [30,[35][36][37].Additio- Fig. 1.The general nucleotide structure of the CTCF binding site [27].
The arrows show the potential sites of CpG methylation nally, CTCF does not bind to the methylated ICR region of the paternal allele locus in vivo [38,39], and CTCF binding to ICR of Igf2/H19 locus of the maternal chromosome hinders its methylation in course of the organism development [39].Wang et al. [40] investigated the in vivo differential methylation of CTCF binding sequences in 19 types of cultured cells and human tissues using ChIP-seq and mass bisulfite sequencing.36 % of CTCF binding sites were occupied in all 19 types of cells.The inverse relation between the degree of methylation and the ability of binding CTCF was demonstrated for 67 % of differentially methylated CTCF binding sites.Therefore, DNA methylation is an important factor defining CTCF protein binding to a certain nucleotide sequences.It was also demonstrated that about 29 % of CTCF binding sequences in the genome contain CpG at least in one of two positions -1 and 11, which corresponds to the positions 4 and 14 of the core sequence of the CTCF binding site (Fig. 1, see [27]) and positions -5 and 5 [26].
Interaction of CTCF with proteins.During the immunoprecipitation from the cell lysate, CTCF is co-extracted with a nucleolar protein nucleophosmin.On the other hand, the method of chromatin immunoprecipitation with antibodies to nucleophosmin was used to demonstrate the interaction of the latter with CTCF-dependent insulators in vivo [41].The chromodomaincontaining helicase CHD8 also binds to CTCF and CTCF-dependent insulators.In the absence of CHD8 the ICR region of the Igf2/H19 locus ceases its insula-tory function, although the binding of CTCF to this region is preserved [42].The protein complexes of the CTCF-dependent insulators were demonstrated to contain a large subunit of RNA-polymerase II with the phosphorylated and dephosphorylated C-terminal domain [43].
The co-immunoprecipitation allowed determining the interaction of CTCF and the DNA-binding protein YY1.N-terminal domain of CTCF is likely to participate in the interaction.The interaction of CTCF and YY1 is required for the inactivation of one X-chromosome in order to support the active state of the second X-chromosome [44].
To date, using different methods, more proteins capable of interacting with CTCF were detected.YB1 (Ybox-binding protein 1) is a multifunctional DNA-and RNA-binding protein participating in the regulation of DNA replication and reparation, transcription, RNA processing, and capable of interacting with YY1 [45].Kaiso is a transcription factor, capable of binding DNA regions with the increased content of methylated CpGsites.Possibly, it is capable of replacing CTCF in case of methylation of site of the binding of the latter [46].The transcription corepressor Sin3A [47], histone H2A.Z [22,41], PARP [48], p68 (DDX5) [49] and other proteins [2,50] also interact with CTCF protein.
Another important protein interacting with CTCF is cohesin.Cohesin is responsible for holding together sister chromatids required for successful mitosis and meiosis [51][52][53].Cohesin is a protein complex consis- ) by means of a subset of its zinc fingers; B -using a subset of zinc fingers the protein binds to its site, the remaining zinc fingers are available for the loop formation (DNA bends while binding CTCF which allows the available zinc fingers to close the loop [33]); C -the formation of a loop by means of two CTCF molecules binding with two sites, located at the borders of the functional domain (the formation is possible with and without the intermediate protein (a green oval); free zinc fingers may participate in the protein-protein interactions [20,34]; the regions within the box correspond to the insulators or border elements of the domain) ting of four subunits: Smc1, Smc3, Scc1 (Rad21) and Scc3 (SA1 or SA2).Four subunits form a ring structure, which, most likely, encircles and holds chromatids together [54].Cohesin is known to be responsible for the interaction of sister chromatids, required for successful mitosis and meiosis, and to be involved in the regulation of the gene expression [55][56][57].The ChIP-seq analysis demonstrated that about half of cohesin binding sites overlap with the CTCF binding sites [58][59][60].Cohesin binds to the region in the C-terminal part of CTCF via its SA2 subunit [61].
The spatial proximity of the regions of interphase chromatin which simultaneously interact with cohesin and CTCF was demonstrated using 3C (chromosome conformation capture) method.The region of the chromosome between the cohesin binding sites forms a loop [56].
The role of the transcription factor CTCF in the regulation of DNA-dependent processes.The CTCF binding to DNA may impact the gene expression in different ways: in some cases CTCF acts as transcription activator, in others -as repressor or ensures the insulator activity.
Insulators are DNA fragments hindering the interaction of the regulatory elements between which they are located.In particular, when located between the promoter and enhancer, the insulator blocks the activating impact of the latter, whereas on the borders of the genetic construction integrated in the eukaryotic genome DNA it suppresses the position effect [62].All known vertebrate insulators, with rare exceptions [63], bind transcription factor CTCF.
Additionally, CTCF protein participates in the inactivation of X-chromosome, the imprinting of genetic information and is likely to regulate the process of RNA splicing.The development of new methods to study the interactions between remote regions of the eukaryotic genome allowed obtaining multiple evidences of a vital role of CTCF in the formation of a three-dimensional structure of the eukaryotic genome [34,[64][65][66][67][68].The understanding of the mechanisms of such numerous and various functions of CTCF protein could clear up its role in the regulation of DNA-dependent processes.
The hypothesis on the functional domains of chromatin.Insulators.In the late 80s of 20 th century a hypothesis was put forward that all the chromatin of the eukaryotic cell is subdivided into structural/functional do-mains.It stated that a chromatin domain is a loop containing one or several genes, the ends of which are attached to the nuclear matrix.For each loop, an independent supercoiling of DNA is typical.The chromatin of one domain, regardless of the chromatin of other domains, may pass into open (transcriptionally active) or closed (inactive) conformation [69,70].
Since the formulation of the chromatin domains hypothesis, numerous data were obtained that support, elaborate and amend this hypothesis.At present, it is established that the main factor controlling the chromatin fiber decompactization and, hence, the possibility to start transcription at the given chromosome region, is the acetylation of histones [71,72].The locus control regions (LCRs), i. e.DNA regions determining the transcriptional status of the domain, have been defined [73][74][75].
The study of the chicken beta-globin locus demonstrated that in the beta-globin expressing cells the chromatin of the locus is in the decompacted state, the level of histone acetylation is increased.At the same time the chromatin outside the locus is in the condensed state [76][77][78].The ends of the beta-globin locus interact with the nuclear matrix and are close to each other, and the locus forms a loop [79][80][81][82][83].It was also demonstrated that the loops may be formed not only by bringing together functional domains ends, but also by joining specific DNA regions, for instance enhancers and promoters, inside the same domain [81,84,85].
Recently it has been established that the transcriptional activity of the gene depends on both the regulatory elements inside the domain and the location of this gene in a specific part of the nucleus [64][65][66][67][68].The gene may also be exposed to the regulatory elements located in other domains and even on other chromosomes [86][87][88].Therefore, the amended and elaborated hypothesis of the structural/functional chromatin domains remains viable.
Given the chromatin domains, there should be functional elements, which protect the genes of one domain against an impact of the regulatory elements of other domains.These functional elements of the genome are insulators [89,90].Insulators prevent undesirable activation or repression of genes under the impact of the environment.The undesirable gene activation by the enhancer is suppressed via blocking its effect on the promoter only in case when the insulator is located between them.The insulator also prevents the undesirable gene repression by limiting the expansion of the condensed chromatin along the chromatin fiber [89,91].Some authors call insulators the border elements, since they are often located at the borders of domains (Fig. 2) [92].
The prevailing majority of insulators found in the vertebrate genomes are capable of binding the transcription factor CTCF [8,63,89].CTCF plays a vital role in the formation of chromatin loops.It was demonstrated that in vertebrate beta-goblin loci the CTCF binding sites located at the borders of loci are in contact with each other [80,81,93], and no activation of the promoters in the adjacent domains by the locus enhancers is detected in the beta-globin expressed cells.Similarly, the regulatory elements of the adjacent domains do not affect the expression of globin genes in erythroid cells [76-78, 94, 95].
The analysis of the human genome showed that the distribution of CTCF binding sites strongly correlates with the density of the genes, but weakly correlates with the lengths of chromosomes [24].Despite the dependence of the number of CTCF binding sites on the number of genes, the great part of these sites (46 %) is located very far from promoters, 48 kbp on the average [24,96].It distinguishes the distribution of CTCF binding sites from the distribution of sites of the majority of other transcription factors and is in good agreement with the insulator function of CTCF.
The genome regions depleted of the CTCF binding sites mostly contain the gene families with the shared regulation of transcription, whereas the domains enriched for the CTCF binding sites contain the genes with alternatively regulated promoters.These observations are also in agreement with the insulator function of CTCF [24].
Enhancer blocking.The first data on the enhancerblocking properties of CTCF binding sites were obtained using the constructions containing a promoter-driven reporter gene and an enhancer.The CTCF binding sequence was introduced between the enhancer and the promoter [8; 97].The analysis of expression of the reporter gene in the presence and in the absence of the CTCF binding site allowed estimating the enhancerblocking activity of this DNA fragment.The majority of CTCF binding sequences demonstrate their enhancer-blocking activity in such constructions during both the transient transfection and the integration into the genome [98][99][100].
The method of estimation of the reporter gene expression at the transient transfection has some drawbacks.Normally the CTCF binding sites function as a part of chromatin, and the plasmid vector does not completely imitate the chromatin environment.
Additionally, in many cases the plasmid vector is a heterological system where the enhancer, promoter, CTCF binding site and the reporter gene may originate from different organisms.
The mechanism of function of the CTCF-dependent insulator was studied using the artificial minichromosome containing the enhancer from human beta-globin locus LCR and the epsilon-globin gene with its own promoter [101].The CTCF-dependent chicken 5'HS4 insulator was cloned between the enhancer and the promoter of epsilon-globin gene, and for the control -close to the enhancer outside the promoter-enhancer unit.When the insulator was placed between the enhancer and promoter, the following effects were demonstrated: the CTCF-dependent enhancer-blocking effect as well as a CTCF-dependent decrease in the amount of the promoter-bound RNA-polymerase II and an increase in the amount of polymerase interacting with the enhancer and CTCF-binding sequence.This may be explained by the fact that the CTCF-containing DNA-protein complex hinders the movement of RNA-polymerase from the enhancer towards the promoter.It was also noted the CTCF-dependent reduction in the acetylation of histones H3 and H4 between the 5'HS4 insulator and the gene, including the promoter region, when the 5'HS4 insulator was located between the enhancer and the promoter [101].
The introduction of several core sequences of the chicken 5`HS4 insulator into the plasmid construction between the enhancer and the promoter resulted in a more pronounced enhancer-blocking effect compared to the introduction of one sequence [102].The same effect is observed while placing between the enhancer and the promoter several CTCF-dependent insulators HS1 or HS2 from ICR of the murine Igf2/H19 locus [37,38].Therefore, several CTCF-containing DNA-protein complexes hinder the movement of RNA-polymerase II from the enhancer to the promoter more efficiently compared to a single complex.However, the signal transmission along the chromatin fiber hardly may be considered as the only mechanism of the enhancer functioning.Many enhancers are separated from their target promoters by millions of base pairs [103], and the enhancer and the promoter are often separated by several chromosome loci with different chromatin structures.Moreover, the activation of the promoter by the enhancer located on the other chromosome is described [86,88].
Some authors believe that this mechanism is realized only in enhancer-like elements of prokaryotes and not in case of real eukaryotic enhancers [104].
Recently, using the 3C (chromosome conformation capture) technology and its extensions [105], the direct interaction of the enhancer with the corresponding promoter was demonstrated [106,107].However, it is still unknown the mechanism of promoter activation via the chromatin loop formation and bringing the enhancer and the promoter closer to each other [108].Possibly, the promoter is drawn into the nuclear compartment with the conditions for the active transcription [108].The ability to stabilize the chromatin loops was demonstrated for different proteins, including CTCF, by the ChIA-PET method [34,87].
According to the decoy model [62,109], the CTCFbinding insulators may compete with the promoters for the interaction with enhancers, i. e. they may trap the enhancer not allowing its binding to the promoter.It was demonstrated [34] that a significant number of CTCF binding sequences participating in remote interactions are bound to the sequences with the enhancer properties.On the other hand, if an insulator functions as a decoy, there should be some signal transmission between the enhancer and the promoter along the chromatin fiber, otherwise the decoy would have the same effect regardless of its position relative to the enhancer which is in contradiction to the insulator definition.The 3C method was used in beta-globin loci of vertebrates to show that the enhancers and promoters are in close physical proximity [80], which is the evidence of the enhancer functioning via the combined tracking-looping mechanism.
Insulator as a border element.During the introduction of the transgene-containing constructs into the genome, the transgene expression initially occurs at approximately the same level in all the cells, but later the majority of the constructs inserted into the genome become inactive due to the chromatin condensation.This inhibition is reversible in Drosophila and yeast, but in vertebrates it becomes irreversible due to DNA methylation [110].The 5'HS4 insulator, located at the 5'-end of the beta-globin locus, hinders the spreading of the adjacent area of the condensed chromatin over the whole locus.
It was demonstrated by the ChIP method [111] that some CTCF binding sites are located in the regions of the transition of the compacted chromatin to the decompacted state.These regions were detected by the change in the level of trimethylation of the histone H3 lysine 27 (H3K27me3) and acetylation of the lysine 5 of histone H2A (H2AK5ac).The modification H3K27me3 is characteristic for the compact chromatin and H2AK5acfor the decompacted one.The portion of the CTCF binding sequences located in the regions of transition of the compacted chromatin into the decompacted state is small, but non-random distribution of these sequences was demonstrated with high significance.It evidences for the probable role of the CTCF binding sites in the separation of the chromosome domains in vivo.
A considerable enrichment of the border regions of so called topological domains for the CTCF binding sites was demonstrated [112].The authors assumed that the border regions of the topological domains may represent the insulators blocking the expansion of heterochromatin.Indeed, the border elements of the topological domains in the differentiated cells correspond to the regions with the reduced level of H3K9me, characteristic for the transformation of heterochromatin into a less compact state.During the formation of the final structure of the genome domains the stabilization of the chromatin state is likely to occur after the formation of the loop structure.
The CTCF binding sequences may not be directly involved in the blocking of the heterochromatin expansion.However, the distinct division of the regions of compacted and decompacted chromatin in the genome may be a result of the formation of the loop structures by the CTCF-containing border elements.
CTCF as a transcription regulator.CTCF is capable of both activating and inhibiting transcription depending on the target gene.The mechanisms of the activation may be different.Firstly, CTCF may act as a trans-cription factor promoting or hindering the formation of the initiator complex.In other cases the binding of CTCF may result in the formation/destruction of the chromatin domain with the corresponding change in the chromatin structure, which, in turn, leads to the change in the genes expression in this domain.It should be noted that it is not always easy to discriminate between these mechanisms.According to the first mechanism, gene MYC is likely to be repressed [5], whereas to the second -gene PUMA [113].Some examples of the regulation involving CTCF are presented below.
The binding of CTCF to 5'-regulatory region of APP (amyloid b protein precursor) gene resulted in the activation of this gene promoter at transcription in vitro in the nuclear extract of HeLa cells [31,114].The addition of oligonucleotides, competing for CTCF binding, to the nuclear extract led to a decrease of the promoter activity.The same effect was observed when CTCF was removed from the nuclear extract via immunoprecipitation, but the addition of the CTCF protein to the depleted nuclear extract restored the activity of the promoter.The region of CTCF protein between the amino acid residues 1 and 248 is responsible for the activation [114].
It was demonstrated that the HS5-1 DNA region, hypersensitive to DNase I and located in the cluster of protocadherin genes, has properties of the enhancer [115] and, according to the chromatin immunoprecipitation data, interacts with CTCF in the murine brain cells.The deletion of this CTCF binding sequence from the genome of transgene mice leads to the reduction in the promoter activity [116].Chromatin immunoprecipitation with subsequent mass sequencing demonstrated that CTCF and cohesin interact with the promoter of alternative forms of protocadherin alpha.The binding directly correlates with the expression of alternative isoforms [117].The 3C method was used to demonstrate the spatial proximity of the promoters of alpha-protocadherins 4, 8 and 12, and the potential enhancers HS5-1 and HS7 [84].
The chicken lysozyme silencer consists of two modules F1 and F2.Module F2 binds to the receptor of thyroid hormones, whereas F1 contains the CTCF binding site, and both modules can suppress transcription independently of each other.When each of them binds its protein factor, their activity is synergized [6].Pos-sibly, the mechanism of suppression-activation of the chicken lysozyme gene involves the co-suppressor Sin3A and the histone acetylase and deacetylase complexes [47,91,118].
The interaction of CTCF with two CpG-islands in the first intron of Bcl6 gene does not allow this gene to be actively expressed.These CpG-islands are methylated in some lymphomas, which prevents CTCF binding to DNA and leads to an increase in the cellular Bcl6 mRNA level [119].
Binding CTCF to the site in the first exon of human telomerase gene hTERT inhibits its transcription.The methylation of CTCF binding sequence is likely to play a vital role in the regulation of this gene expression.The first exon of the telomerase gene within the plasmid construction has a suppressing effect on some promoters in the cells both expressing and not expressing telomerase.In the telomerase-expressing cells, it is observed the methylation of the CTCF binding sequence in the first exon and the absence of CTCF binding [120].
After treatment with 5-aza-2'-deoxycytidine, leading to the DNA demethylation, the binding of CTCF to the site in the first exon of the hTERT gene and the inhibition of its mRNA synthesis are observed in the cell lines expressing hTERT.The inhibition of CTCF expression using short hairpin RNAs led to an increase in the level of the hTERT mRNA in the cells [120; 121].
The participation of CTCF protein in the imprinting of genetic information.The gene imprinting is defined as the way of inheritance, when only one parental allele of the gene is expressed in the progeny.The choice of the allele to be expressed is determined by its origin from either the paternal or maternal organism.CTCF interacts with the DNA region that controls the imprinting (imprinting control region, ICR).The most studied example of CTCF participation in the imprinting is the regulation of the genes of the mouse Igf2/ H19 locus [37,125].Igf2 (insulin-like growth factor 2) gene encodes the embryonic mitogen [37], from H19 gene the non-coding RNA is transcribed, slowing down the fetal growth [62,126,127].The expression of Igf2 occurs only from the paternal chromosome, H19 -only from the maternal due to the ICR located between these genes.ICR on the maternal chromosome of mice contains two sites capable of binding CTCF and having the insulator properties.ICR on the paternal chromosome is methylated regardless of the tissue and the stage of development and is not capable of binding CTCF.Therefore, the CTCF-dependent insulators on the paternal chromosome are inactive, the promoter freely interacts with the enhancers and Igf2 is expressed [128].
While interacting with ICR of the maternal chromosome, CTCF is likely to protect the adjacent regions of the locus, including the promoter and the inner regions of gene H19, from the methylation.If the maternal ICR is mutated and does not bind CTCF, the methylation spreads to the promoter and the intragenic regions of H19, and the expression of this gene decreases [129,130].Therefore, by the inhibiting of the CTCF binding, the methylation of the paternal ICR leads to the methylation of the promoter of H19 gene and the inhibition of the expression of the paternal allele [129][130][131].
CTCF and the inactivation of X-chromosome.CTCF participates in the inactivation of the mammalian Xchromosome [132].The X-chromosome contains the X-inactivation center (Xic) that controls the inactivation of one X-chromosome in each cell during the embryogenesis and the maintenance of its silenced state.Xic includes genes Xist (X-inactive specific transcript), Tsix (the name comes from the reverse of Xist, which reflects the reverse orientation of the Tsix gene regarding to Xist) and Xite (X-inactivation intergenic transcription element), from which the non-coding RNAs are transcribed.The X-chromosome, where the actively expressing allele of Xist is located, is inactivated.The transcription of Tsix and Xite genes occurs from the active X-chromosome.The transcript of Tsix inactivates the expression of Xist located on the same chromosome, and the expression of Xite activates the expression of Tsix, which leads to the inhibition of the expression of Xist, resulting in maintenance of the activity of the X-chromosome with transcriptionally active Tsix and Xite alleles [50,133].
The process of X-inactivation is realized in several stages [133][134][135][136].First, the «counting» of the number of X-chromosomes in the cell, or rather the determination of the ratio between the number of X-chromosomes and autosomes [133,134].The next stage is the homologous pairing of two X-chromosomes and choice of the one to be inactivated.The interaction between X-chromosomes takes place in the Xic region, and for the pairing of X-chromosomes the fragment of Xic containing Xite and Tsix genes is required [137,138].This region is rich in the binding sites of CTCF as well as protein YY1 [44,132,139].It was demonstrated that in the absence of CTCF in the cell the pairing of X-chromosomes is suppressed.Protein YY1, capable of interacting with CTCF, is not required at this stage [139].A model is proposed according to which the pairing of X-chromosomes leads to irreversible transfer of protein factors like CTCF and Oct-4 from one X-chromosome (later inactive) to the other (later active) [44,[138][139][140][141][142].If this model is accurate, CTCF protein is also involved in the selection of the X-chromosome to be inactivated.The last stage is just the process of X-chromosome inactivation.As a result of active expression of Xist gene, the Xist transcript covers the chromosome where it was synthesized, and the chromatin of this X-chromosome becomes compacted [134][135][136].
The process of X-chromosome inactivation involves the interaction of the repressor complex PRC2 with 5'-region of Xist gene as well as the enrichment of chromatin of this region in trimethylated lysine 27 histone H3 [143,144].The inactivation of X-chromosome requires the transcription of a small region of Xist into RepA RNA [145,146].RepA is likely to interact with the PRC2 complex [144].It was demonstrated that Xchromosome containing in the Xist promoter the CTCF binding site with the enhanced affinity to this protein is inactivated more often than X-chromosome where this DNA fragment has lower affinity to CTCF [147].Along with YY1, CTCF is likely to participate in the activation of the Tsix gene expression [44].CTCF also ensures the functioning of insulator elements, separating the active genes on the inactivated X-chromosome from the transcriptionally non-active environment [148].
The co-transcriptional regulation of the alternative splicing.Recently some evidences have been found of the fact that CTCF may be involved in the co-transcriptional regulation of the alternative splicing.It was demonstrated that the interaction of CTCF with the DNA fragment located in the fifth exon of CD45 gene leads to a more frequent inclusion of this exon into the mature mRNA.CTCF depletion in the cell by RNA-in-terference, or methylation of the CTCF binding site in the fifth exon of CD45 gene resulted in the inhibition of the formation of a splice form containing the fifth exon.It was shown that CTCF binding to this site leads to RNA-polymerase II pausing within the region of CTCF binding.The comparison of the genome-wide chromatin immunoprecipitation data obtained with antibodies to CTCF and RNA-polymerase II and the data of the transcriptome analysis confirmed that the CTCF protein bound to DNA detains the transcribing RNA-polymerase II [149,150].
These data were used to suggest the hypothesis on the role of the CTCF protein in the regulation of the alternative splicing.The initial stages of RNA splicing take place during the transcription, and the preferential development of a splice form may depend on the elongation rate.If in the transcript, after the occurrence of a relatively weak splice-site, but prior to the occurrence of a more strong one, the RNA-polymerase II is detained by the CTCF protein, bound to DNA, a weaker splicesite will gain an advantage.Thus, the CTCF protein may affect the elongation rate and regulate the alternative splicing [149,151].
Conclusions.The main functions of CTCF, studied thus far, are as follows: -the direct regulation of transcription of some genes; -the organization of the domain structure of chromatin; -the organization of insulators (enhancer-blocking and border elements); -the participation in the genome imprinting and inactivation of X-chromosome; -the regulation of the alternative splicing.
In most cases the accurate mechanism of these functions is yet to be studied.
A great number and variety of the CTCF binding sites, the multitude of its functions and the fact that CTCF is present only in relatively developed metazoa and is absent in plants and protozoa, suggest that its main task is to regulate the organism development and to organize the cellular genome so that it could be a constituent of a multicellular organism.
Funding.The work of the authors was supported by the grant program for the leading scientific schools of Russia (Project NSh_1674.2012.4)