C-methods to study 3 D organization of the eukaryotic genome

It is becoming increasingly evident that spatial organization of the eukaryotic genome plays an important role in regulation of gene expression. The three-dimensional (3D) genome organization can be studied using different types of microscopy, in particular those coupled with fluorescence in situ hybridization. However, when it comes to the analysis of spatial interaction between specific genome regions, much higher performance demonstrate chromosome conformation capture (3C) methods. They are based on the proximity ligation approach which consists in preferential ligation of the ends of DNA fragments joined via protein bridges in living cells by formaldehyde fixation. It is assumed that such bridges link DNA fragments that are located in close spatial proximity in the cell nucleus. In this review we describe current 3C-based approaches, from 3C and ChiP-loop to Hi-C and ChiA-PET, going under the collective name of C-methods.


Introduction.
A characteristic feature of the eukaryotic genome, in comparison with the prokaryotic, is its packaging into chromatin -a complex of DNA and various proteins placed in a special cellular compartment -cell nucleus.Genomic DNA whose total length can achieve several meters should be folded hundreds of thousands of times to fit within comparatively small volume of the nucleus.There are several levels of such DNA compaction which include wrapping naked DNA around histone octamers to form nucleosomes, organizing them into a 30-nm chromatin fiber, and its further folding in loop domains of 50-200 kb attached to the nuclear matrix [1][2][3].Besides reduction of linear size, packaging the genome into chromatin provides for a fundamentally new way of regulation of DNA transcription, replication, repair and recombination.It consists in the ability to change the relative positions of genomic elements in the nucleus and the degree of DNA compaction which influences accessibility of different genomic regions for regulatory factors.
The problem of genome spatial organization is tightly linked to the problem of functional compartmentalization of the nucleus, in particular to the existence of active and repressive nuclear compartments.The latter includes perilamellar compartment and heterochromatin regions (for example, centromeric heterochromatin).Some nuclear compartments can be observed in a light microscope (for example, nucleolus and heterochromatin) or electronic microscope (replication factories).However most of the compartments can be visualized only by immunofluorescence microscopy (speckles, Cajal bodies, perinucleolar compartment, transcription factories [4].Immunofluorescence methods also allow for the analysis of spatial organization of replication.
Many DNA-dependent processes are controlled by cis-acting DNA-regulatory elements.For example, acti-ve transcriptional status of a gene frequently depends on direct interaction of its promoter with upstream enhancers resulting in the assembly of an active chromatin hub [5][6][7].On the other hand, interaction between insulators would place a gene in a chromatin loop contributing to functional isolation of this gene from external regulatory signals [8].In addition to enhancer and insulator loops, recent studies have demonstrated the existence of interactions between the start and end of a gene [9,10].There are more distant spatial interactions of different genomic regions, for example interchromosomal associations between a gene and regulatory region situated on another chromosome [11] or juxtaposition of certain genes located far apart in the genome to share a common transcription factory [12] or become partners for malignant translocations [13].
Not long ago only fluorescence in situ hybridization (FISH) was a tool to study interactions of distant genomic elements in the nuclear space.Using this experimental approach a researcher was able to address a rather limited number of questions.Indeed, although FISH is helpful in identifying contacts between very remote genome regions, it cannot be used to probe medium and short-range interactions, which means that many regulatory interactions (like promoter-enhancer communication) could not be studied.This limitation is due to the fact that in FISH experiments any DNA sites located less than 150 kb apart in DNA sequence will produce a merged signal even if these sites do not interact with each other [14].The problem was solved with the invention of chromosome conformation capture (3C) technique [15].It relies on the idea that digestion and religation of cross-linked chromatin, followed by the quantification of ligation junctions, allows for the determination of interaction frequencies of different pairs of genomic elements.Since development in 2002, the technique has given rise to a bunch of derivative methods now making a powerful apparatus to study the genome spatial organization and its functional output.Below we describe and compare different 3C-based methods and provide examples of biological questions addressed by these methods.
Searching interactions within certain loci.3C.The 3C protocol includes the following steps: Cells are treated with formaldehyde to cross-link proteins to other proteins nearby and DNA.After lysis of nuclei by SDS and solubilization of proteins that were not crosslinked, the resulting DNA-protein network is subjected to cleavage with a restriction enzyme(s), which is followed by ligation at a low DNA concentration.Under such conditions, ligations between DNA fragments cross-linked via protein bridges are strongly favored over ligations between random fragments.After ligation, the cross-links are reversed, and ligation products, one by one, are detected and quantified by polymerase chain reaction.Primers for PCR are designed to anneal at the ends of restriction fragments of interest facing outwards (Figure , A).The current 3C protocols suggest using real-time PCR with TaqMan probes to improve performance of the assay [16].The cross-linking frequency of two specific restriction fragments, as measured by the amount of corresponding ligation product, is, to a first approximation, proportional to the frequency with which these two genomic sites interact.Thus, 3C analysis provides information about the spatial organization of chromosomal regions in vivo [15,17].
Developed to analyze conformation of chromosomes in yeasts [15], the 3C technology in short time was adopted to study spatial organization of genomic loci in higher eukaryotes.Now it is a routine method to perform studies on chromatin, transcription and gene regulation.
RNA-TRAP.At about the same time that 3C was developed, a very different biochemical approach was described for analyzing the spatial proximity of genome elements.RNA-TRAP (tagging and recovery of associated proteins), as it was called, involves targeting horseradish peroxidase activity to the primary transcripts associated with an actively transcribed gene.This is achieved by in situ hybridization of a gene-specific intron probe labeled with digoxigenin to primary transcripts followed by immunodetection of the probe with digoxigenin-specific antibodies conjugated to horseradish peroxidase.After addition of biotin-tyramide, the localized horseradish peroxidase activates tyramid which mediates the covalent deposition of the linked biotin tag on chromatin proteins in the immediate vicinity of the gene.The DNA fragments linked to biotinylated proteins are isolated by affinity chromatography and analyzed using real-time PCR [18].The technique was originally applied to an actively transcribed mouse b-globin gene, and, in agreement with 3C analysis of the same locus [17], a peak of biotin depo-sition was observed 50 kb away at hypersensitive site 2 (HS2) of the LCR.This implied that HS2 was in close spatial proximity to the actively transcribed b-globin gene [18].
An apparent disadvantage of RNA-TRAP compared to 3C is that it can be applied only to transcribed sequences.Moreover, RNA-TRAP technique seems to be limited to genes such as b-globin that are transcribed at a very high level (so that there are sufficient primary transcripts at the locus for efficient biotin deposition).This partly explains why this technique has not been widely used.Also it should be noted here that RNA-TRAP is not based on the proximity ligation and so does not fall into the category of C-methods.
ChIP-loop assay.ChIP-loop assay represents a combination of 3C and Chromatin Immunoprecipitation (ChIP).It allows one at the same time to determine which genomic sites interact in the nucleus and to suggest candidate proteins mediating the interaction.In this method, after formaldehyde fixation and lysis of cells, the cross-linked chromatin is purified of free proteins by urea gradient ultracentrifugation [19].Purified chromatin is digested with a restriction enzyme and subjected to precipitation with specific antibodies follow-ing a standard ChIP protocol.The beads with precipitated chromatin are then resuspended in ligation buffer, and the chromatin is ligated directly on beads.Ligation products are then purified and analyzed as in usual 3C experiments (Figure , F).
ChIP-loop assay allows one to identify from a panel of tested proteins the ones that may take part in DNA loop organization.For example, it was shown that Mecp2 transcriptional repressor is important for organization of silent chromatin loops [19].On the contrary, SATB1 protein participates in formation of the high-order structure of active chromatin [20].However, it should be understood that protein being crosslinked to interacting DNA fragments is not sufficient for assuming protein participation in DNA loop formation: the protein may bind DNA nearby interacting sites but do not mediate the interaction.To that end, additional experiments may be helpful, for example analyzing if a temporal knockdown of this protein synthesis affects the characteristic spatial configuration of the DNA region under study [21].And nevertheless, in some aspects ChIP-loop assay provides a better insight than 3C and ChIP do when used apart.It concerns the situation when a positive ChIP signal originates from a cell subpopulation where the geno- M3C.In our laboratory we developed a variant of 3C allowing analysis of spatial proximity of DNA fragments bound to the nuclear matrix [22].The nuclear matrix is an operationally-defined skeletal structure that underlies the nucleus [23].Many reports indicate that multi-enzyme complexes and different DNA regulatory elements are associated with the nuclear matrix [24,25].Our goal was to check, using the new approach, the possibility that the nuclear matrix constitutes a platform for genomic elements interaction and chromatin hub assembly.
The protocol, referred to as Matrix 3C (M3C), includes a high salt extraction of nuclei (which removes histones and unfolds DNA loops bound to the nuclear matrix), removal of distal parts of DNA loops using restriction enzyme treatment, ligation of the nuclear matrix-bound DNA fragments and a subsequent analysis of ligation frequencies (Figure , H). Importantly, in contrast to 3C, M3C protocol does not include formaldehyde fixation.
Using the M3C procedure, we demonstrated that promoters of at least three housekeeping genes that surround the chicken a-globin gene domain were assembled into a complex (presumably, a transcription factory) integrated in the nuclear matrix.In erythroid cells, the regulatory elements of the a-globin genes were attracted to this complex.Based on these observations, we proposed a model according to which mixed transcription factories that mediate the transcription of both housekeeping and tissue-specific genes are composed of a permanent compartment containing integrated into the nuclear matrix promoters of housekeeping genes and a «guest» compartment where promoters and regulatory elements of tissue-specific genes can be temporarily recruited [22].
Searching interactions throughout the genome.4C.4C was the first of the C-methods allowing a full genome analysis of DNA-DNA interactions.It was independently developed in two variants differing in names but not in abbreviations.The first one is designated as Circular Chromosome Conformation Capture.The strategy is aimed at amplification of circular DNA mole-cules originated from cross-ligation of both ends of cross-linked restriction fragments (Figure , D).Two PCR primers are designed to anneal at the opposite ends of a restriction fragment of interest, facing outwards.In such a way, all DNA fragments ligated with the fragment of interest at both ends are amplified.The resulting 4C DNA library representing the whole set of partners of the DNA fragment of interest is analyzed by cloning and sequencing [26].
The second variant of 4C that has gained more popularity among the researchers is designated «Chromosome Conformation Capture on Chip».In this very similar technique ligation products obtained as in standard 3C procedure are digested with a frequently cutting secondary restriction enzyme and then ligated to form small circular DNA molecules that are amplified with primers specific to the restriction fragment of interest called «bait» or «viewpoint» (Figure , C). Originally, the resulting 4C DNA library was analyzed by the DNA micro-array (chip) technology [27].Currently deep sequencing is employed for this purpose.For this reason the method is more frequently referred to as 4C-seq.
4C technology has proved its potential in solving many biological questions.With the aid of this method the DNA interaction profiles of tissue-specific and housekeeping genes were analyzed [27].4C was also used to find out how transcription or a presence of an enhancer affect the position of a locus in the nucleus [28][29][30].Using an allele-specific 4C strategy, the phenomena of dosage compensation of the mammalian X chromosome was studied.It was shown that the inactive X chromosome adopts a unique three-dimensional configuration that is dependent on Xist RNA [31].
5C. 5C designates «Chromosome Conformation Capture Carbon Copy».In this protocol the 3C ligation products are mixed with a set of special primers that are designed to anneal at the very ends of all restriction fragments from the genomic region under study, ones facing outwards and the others -inwards, so that an end (either 5' or 3') of each primer covers exactly a half of a restriction site.In such a way outward and inward primers anneal tail-to-head across ligated junction of definite ligation products and are then ligated (Figure , B). Additionally these primers contain universal tails for amplification.Such amplification having been done, resulting 5C DNA library is analyzed using either micro-arrays or deep sequencing.The original 3C library determines the spectrum and frequencies of occurrence of the final 5C products.As a result, the 5C library is a quantitative «carbon copy» of a part of the 3C library, as determined by the collection of 5C primers [32].
5C technology was successfully applied to study the configuration of the human a-globin locus [32], the human b-globin locus [33], and the human HOX gene cluster [34].It is worth noting that although 5C provides a matrix of interaction frequencies for many pairs of sites, it cannot embrace the whole genome, only its selected regions, and therein resembles 3C.Truly whole genome C-approaches will be described below.
Hi-C.In this modified 3C procedure an extra step is introduced between restriction enzyme digestion and ligation -filling DNA ends with nucleotides one of which is biotinylated.After blunt end ligation, DNA is purified and sheared, and ligation junctions marked by biotin are isolated by affinity chromatography on streptavidin beads followed by deep sequencing analysis (Figure , E).Thus, Hi-C data allow a matrix of ligation frequencies between all fragments in the genome to be constructed [35].
Hi-C was successfully used to analyze the general principles of the genome folding in different taxonomic groups of organisms, from yeasts [36] to mice [37] and humans [35].
ChIA-PET.ChIA-PET stands for Chromatin Interaction Analysis by Paired-End Tag sequencing.It is a genome-wide version of ChIP-loop assay.In ChIA-PET soluble cross-linked chromatin complexes are obtained by sonication instead of restriction enzyme digestion.The complexes containing a protein of interest are separated by immunoprecipitation (as in ChIP-loop assay).Specially designed linkers containing the MmeI recognition site at one end are then ligated to the ends of DNA fragments (Figure , G).At this first ligaion step only one end of the linker (close to the MmeI recognition site) is phosphorylated.Thus, after ligation, the MmeI sites are always situated close to the junction of the linker and a DNA fragment.The linkers contain biotinylated nucleotide residues to facilitate subsequent purification.The ligation of linkers having been performed, their free ends are phosphorylated and the proximity ligation is carried out in a highly diluted solution.The so-called PET fragments (Paired-End Tags) are then released by MmeI digestion.This enzyme cuts DNA at a distance of 20 bp downstream of the recognition site.For this reason the PET fragments contain a common internal part (joined linkers) and two 20 bp DNA sequence tags originated from DNA fragments that have been joined by proximity ligation.After affinity purification on a streptavidin column, the PET fragments are analyzed by deep sequencing.The resulting ChIA-PET sequences are mapped to reference genomes to reveal relationships between remote chromosomal regions brought together in close spatial proximity by protein factors [38].
ChIA-PET was employed to map the chromatin interaction network mediated by oestrogen receptor alpha in the human genome [38].It was also used to reveal CTCF-mediated chromatin interactome in mouse embryonic stem cells [39].
Concluding remarks and outlook.Beyond question, development of the 3C and derivative methods has brought the field of genome spatial organization to a new level.It has become evident that the 3D organization of the genome can bring together distant regulatory elements and thus plays and important role in control of the genome functions.Disclosing the general principles of the genome folding can provide insight into the complex relationships between chromatin structure, gene activity, and the functional state of the cell.
Still, 3C has a number of weak points.It is recognized to be a rather complicated method that is based on many assumptions, and the correct interpretation of the results requires that numerous control experiments be performed [16,40].In addition, this method (and other C-methods based on the proximity ligation) has a number of important limitations.First, it does not enable the estimation of the proportion of the cells in which two particular DNA sequences are in close proximity.Second, 3C cannot directly demonstrate simultaneous interaction of several genomic elements.Results obtained using 4C analysis suggested that active chromatin hubs that include more than two chromatin elements might exist in the cell [26].However, it is still unclear whether hub formation occurs in a considerable proportion of the cells in which a particular locus is activated.Thus, the active chromatin hub model remains hypothetical.Finally, 3C can only determine an average interaction pattern for a given cell population.To gain fur-ther insight into the nature of chromatin long-range interactions, the studies should be redirected from a mythic «average cell» [41] to individual cells and even to individual chromosomes.New approaches would be of help in studying these questions. Acknowledgements.
A schematic representations of the principles of different C-methods: A -3C; B -5C; C -4C (circular chromosome conformation capture); D -4C (chromosome conformation capture on chip); E -Hi-C; F -ChIP-loop; G -ChIA-PET; H -M3C (see the text for details).Encircled are the basic steps of the 3C procedure appearing in various forms in all C-methods («C-core»).The question marks in sections C and D indicate DNA fragments to be analyzed by deep sequencing mic locus under study has a linear configuration, whereas a positive 3C signal originates from another cell subpopulation in which the protein does not bind corresponding DNA sites.