Variability of DNA structure and protein-nucleic acid reconginition

Revealing molecular mechanisms of sequence-specific recognition of DNA by proteins is one of the key tasks of biology. The current review presents the results of statistical analysis of the structural databases obtained by different scientific groups studying the conformational features of free and protein-bound DNA fragments that could be used for clarifying the mechanisms of protein-nucleic acid recognition. The analysis of the published data allowed us to make the following generalizations. The ability of DNA double helix to adopt alternative conformations, including the ones of sugarphosphate backbone, is an intrinsic characteristic of certain DNA sequences. Such conformational transitions are the potential sources of formation of unique geometry of the dinucleotide steps and/or individual nucleotides and lead to alteration of base stacking and/or changes of the assessable surface area of atoms, and can be the criteria of recognition of DNA by protein as well. Changes in the physical properties that depend on the DNA structure, i. e. the polar/unpolar profile and electrostatic potential of the grooves, can also be used by protein for DNA readout.


Introduction.
Uncovering the molecular mechanisms of sequence-specific recognition of DNA by proteins is one of the important biological tasks.It is necessary to understand the mechanisms of regulation of biological processes underlying storage, readout and transfer of genetic information as well as to control these processes [1][2][3].The data on atomic structure of DNA fragments, proteins and their complexes obtained by X-ray structure analysis are widely used to study the principles of protein-nucleic acid recognition [4][5][6].
Rapid development of crystallographic analysis of DNA structure started in 1979, when a detailed structure of hexamer d(CGCGCG) 2 was obtained [7].The results were quite unexpected because the double helix of the hexamer differed radically from either canonic A-or B-forms of DNA, described earlier by the diffraction analysis of DNA fibers.The structure of d(CGCGCG) 2 oligomer was a zigzag left-handed helix, named DNA Z-form.Simultaneously, the crystalline oligomers with other nucleotide sequences corresponding to A-DNA (octamer d(GGTATACC) 2 ) [8] and B-DNA (dodecamer d(CGCGAATTCGCG) 2 ) [9,10] were obtained.Dodecamer d(CGCGAATTCGCG) 2 , the binding site of EcoR1 restrictase, thanks to its mixed nucleotide content, appeared to be a convenient object to explore the dependence of DNA helix parameters on nucleotide consequence.During the following decade, 22 variants of d(CGCGAATTCGCG) 2 fragment obtained under different conditions of crystallization were examined in various laboratories [11].The results showed that both the width of the minor groove and the bend of DNA helix axis were sequence-specific.
Significant narrowing of the minor groove (up to 3.5 Å) appeared to be characteristic for AT-sites [12] and not inherent in GC-fragments of DNA.As a result of mall width of the minor groove, an ordered structure of hydrated water is formed on the AATT-fragments [13], so called hydration "spine", which in its turn contributes to the abnormally narrow minor groove in these regions.The conclusion about a sequence-specific character of deformation of the DNA double helix axis was made after revealing its pronounced bent (10 0 -20 0 ) in the places of GC/AT junctions [10].
Dickerson and some other authors [10][11][12][13] suggested that on the basis of a thorough analysis of local helical parameters of various nucleotide sequences along with the data on the contacts between bases and amino acid residues in protein-nucleic acid complexes it would be possible to determine the principles of dependence of a double helix structure on a nucleotide sequence and to formulate the general rules for recognition of certain DNA sequences by proteins.However, up to date these rules have not been established because of high DNA conformational mobility and non-predictability of amino acid-base interactions in binding sites.
In the present decade, the mechanisms of protein-nucleic acid recognition have been studied using the statistical analysis of the data obtained by the X-ray structure analysis of DNA crystals, proteins and their complexes [14][15][16], which are available thanks to creation of online structural databases (PDB, NDB, Swiss Prot, etc.) [17,18].Besides, in silico experiments by the methods of numerical simulation (see [19][20][21][22][23][24] for details) allowed assessing changes of various physicochemical parameters of protein and nucleic acid after their complexes formation.They are: the changes in surface area, accessible for the solvent [16], varying in conformational parameters and DNA packing under interaction with proteins [2][3][4], the deformation of relief of protein and DNA surfaces at the complex formation [25].As a result of application of this complex approach two basic mechanisms of protein-nucleic acid recognition were identified: 1. Direct, or specific, readout implies that proteins interact with DNA sequences forming direct contacts (H-bonds) between amino acid residues and unique atomic groups of bases, ( the substituents of pyrimidines in C4 position and purines in C6 and N6 positions), situated in the major groove [1,6].
2. Indirect readout implicates that a protein binds non-specifically to the base and sugar-phosphate DNA backbone moieties.Recognition is determined by conformational features of a certain DNA fragment or its deformational ability [1,6,17,[26][27][28].In these cases, the criteria of recognition can be: alterations in local geometry of base pairs or sugar-phosphate backbone; bends or kinks of DNA fragments; peculiarities of both major and minor grooves relief; ordered hydrated shell.Such features can pre-exist in the certain DNA fragment or appear as a result of binding with ligands (other proteins, ions, biologically active small molecules) and dehydration of the binding site, preceding the interaction.
As a result of combined analysis of structural data and physicochemical properties of the complexes, the main types of contacts between proteins and DNA were determined: electrostatic, Van der Waals forces and hydrophobic interactions [29][30][31][32]; formation of H-bonds [15,33], including C-H¼O bonds [34]; cation-p-system interactions [35]; interaction mediated by molecules of bound water [19,36].
The main conclusions made on the basis of these investigations are the following: 1. DNA-protein complexation generally occurs via both direct and indirect mechanisms.
2. Contribution of these mechanisms to the realization of the protein-nucleic acid interaction varies depending on the protein family, DNA sequence in the binding site and other factors.
3. The relative contribution is determined by the specific protein type and the DNA fragment structure.Thus, it is difficult to formulate the general rules of protein-nucleic acid recognition [37].
In the present review, we focus our attention on the results of statistical analysis of structural databases obtained by various research groups, studying the conformation features of both naked DNA and DNA-proteins complexes.Systematization of such information will allow us to formulate more accurately the problems, appearing at description of protein-DNA interaction mechanisms, including more diverse mechanism of indirect recognition.
Variability of the DNA double helix.DNA variability and the problem of protein-nucleic acid recognition.DNA polymorphism and structural changeability at the formation of complexes with protein are considered to be an important biological problem.To solve this problem we need: (1) to estimate a conformation mobility of DNA structure, and (2) to compare it with the data on DNA double helix rearrangement induced by interactions with proteins.Up to date, the DNA double helix is known to retain its own structure when the complexation with various biologically significant molecules occurs.Furthermore, a considerable variability revealed for short DNA fragments, produces various conformation states characterized by high sequence specifity.It is still unclear whether the structural flexibility of DNA is a capacity attributed to certain sequences or it is stimulated by the interactions with proteins.
The capability to change the double helix structure of DNA is especially important for proteins for reliable recognition of certain DNA sites during the protein-nucleic acid complex formation [38].In general, protein binding leads to alterations in the bases alignment and sugar-phosphate backbone structure.Such local conformational DNA rearrangement can result in bends or kinks, like in the sequence-specific CAP-DNA complex [39] or at the nucleosome formation when DNA is winding around the histones [40].It is obvious that sequence-specific deviations of DNA structure from canonic B-form are more likely a rule than an exception and play a crucial role in the process of protein-nucleic acid recognition [24].
The fundamentals for sequence specific protein-nucleic acid recognition, analogous to the principles of complementary governing the formation of DNA double helix , have not yet been formulated [41].Therefore, thorough investigation of the mechanisms of DNA conformational variability at the formation of complexes with proteins is an actual scientific challenge.However, the idea of a "recognition code" between amino acids and nucleotides has not been confirmed so far [42].Impossibility to establish such a code is explained, in particular, by the existing of the vast number of degrees of freedom in the protein-DNA contact sites during formation of their complexes [29].
Forms of DNA double helix and their parameter.As stated, protein-nucleic acid recognition is directly associated with the ability of DNA molecule to change its conformation.One of the most important biological issues is how DNA conformational variability on the local level (alignment of nucleotide pairs, configuration of distinct double helix "steps"dinucleotides, sugar-phosphate backbone conformation) influences both the global structure of DNA molecule and its interaction with proteins [3].Therefore, the criteria are required to establish whether there are the conformational rearrangements in question and, if yes, to identify their type.
The structural differences between A-and B-DNA double helixes obtained by X-ray structure analysis of the fibers, are characterized by a number of conformational parameters [43][44][45][46][47][48][49], namely helix parameters (an angle of helical twist between the two neighboring residues -twist; a distance between nucleotides along the helical axis -rise; a helical stepaxial rise; an angle of pair slope -tilt), displacement of the base pairs relative to the helical axis (x-displacement), width and depth of the grooves, virtual interchain distances (d pp , d C1'C1') (Table 1); a phase angle of sugar pseudorotation P, and sugar-phosphate backbone conformation.
The sugar-phosphate backbone conformation is described by preferential configurations of the sugar residues and torsion angles of the polynucleotide chains.Analysis of the nucleotide structure showed that sugar residues accept one of the two most probable conformations: C3'-endo with 0 o < P < 36 o (B-DNA) or C2'-endo with 144 o < P < 190 o (A-DNA) [43].Sugar switching from C3'-to C2'-endo-conformation is accompanied by alteration of nucleotide position relative to desoxyribose (Fig. 1).As a result, availability of hydrophobic atoms for the formation of contacts with proteins is different in A-and B-DNA [52].
It has been shown by the X-ray structural analysis of DNA fibers that sugar conformation [53,54], helical parameters rise and twist [43], width and depth of the grooves [55] are the important criteria for referring the DNA structure to either A-or B-type of double helix.Obviously, the values of those parameters are interrelated.
Different conformations of desoxyribose determine the variations in distances between phosphorus atoms of neighbouring phosphate residues in the same polynucleotide chain of A-and B-DNA double helixes (  [48].

Table 1 Structural parameters of A-and B-DNA for the fibers and crystal DNA structures ( X-ray structure analysis).
B-DNA helixes are twisted in a different manner that corresponds to various angles of helical twist.The tilt value also significantly differs for A-and B-types and, thus, correlates with the sugar conformation (C3'-endo in B-DNA and C2'-endo or equivalent C3'-exo in A-form) [43].
In addition, A-and B-DNAs differ by the values of displacement of base pairs relative to the helical axis (x-displacement).In B-DNA the helical axis goes through the nucleotide pairs (displacement ~0.2 Å).Due to a larger value of displacement of base pairs from the helical axis (-4.4 ÷ -4.9 Å) in A-DNA, polynucleotide chains wind around the axis and form a hollow cylinder inside.Nucleotide pairs are situated on the helix periphery thus leading to the formation of a very deep and narrow major groove together with shallow and wide minor groove.In B-DNA, the grooves are less marked, their depths are approximately identical, however, they differ in width, the major groove being wider than the minor one (Table 1).
More subtle features of A-and B-DNA double helixes have been determined by the high resolution X-ray structure analysis [56].On the level of discrete dinucleotide "steps", the DNA structure has appeared to be irregular, sufficiently depending on the nucleotide sequence, having on each "step" the characteristics different from the mean values for canonical A-and B-forms.
The primary data on crystal structures of A-and B-DNA revealed significant differences in three of six local dimer step parameters -twist, roll, slide (Fig. 2) [57].
The most prominent distinctions of slide and roll parameters were revealed by statistical analysis of crystal structures of A-and B-DNAs.A-DNA's slide is equal to -1.57(± 0.38) A ° , roll to -7.9 (± 5.6) o ; whereas B-DNA's slide is equal to -0.21 (± 0.074) A ° and roll to] -0.2 (± 5.7) o [59], i.e. the B®A transition is accompanied by base pairs untwisting, roll increase and slide decrease (Fig. 3).
An additional discriminatory factor to identify the dinucleotide type may be a mean distance between phosphorus atoms of neighbour nucleotides along z axis Zp (Table 1) Fig. 2. Local dimer step parameters [58] with a value of c angle and the coefficient of correlation is approximately equal to -1.0.Additionally, "A-phylic" (or "A-like") dimer steps (GG-CC, GT-AC), including those which are found in B-DNA, can have helical parameters specific for A-DNA, i.e. negative slide, positive Zp, large value of roll and lower -of twist."A-phobic" (or "B-like") dimer steps (AA-TT and GA-TC) have the same values of these parameters as those in B-form [3].
Stacking in A-and B-DNA is also different.In the double helix of B-type, stacking is generally limited to interactions between bases of the same polynucleotide chain (intrachain stacking).In A-helix bases from different chains participate in stacking as well, i.e. there are both inter-and intrachain types of stacking.This is determined by two reasons [43].On the one hand, the angle between neighbouring nucleotides (twist) in A-DNA helix is smaller (30 o -32,7 o ) than in B-type helix (36 o -45 o ) that facilitates the formation of both stacking types.On the other hand, the angle of pair tilt, that is positive in A-type helix and negative in B-type, causes an increase in pair overlapping in the first case and a decrease -in the second.
DNA in complex with proteins is known to contain more A-"like" nucleotides, arranged as small clusters in one of DNA chains in the sites of contacts between DNA and proteins.It remains obscure whether such B®A transition is induced by interaction with protein or occurs independently playing a role of recognition factor for a protein [52].
Local variations of sugar-phosphate backbone also can be the "signals" at realization of indirect mechanism of protein-nucleic acid recognition and participate in structural "matching" of DNA binding site [62,63].
It is known that conformation of sugar-phosphate backbone of double helix in direction of P ® O5'®C5'®C4' is described by torsion angles a, b, g, d, e, and z, five endocyclic torsion angles of sugar n 0 ņ4 and angle c, which determines the arrangement of base in relation to the sugar ring (Fig. 4).
Mean values of all torsion angles were determined for A-and B-DNAs (Table 2).Comparison of the results of numerous studies shows that any angle alone can not be considered as criterion of A-or B-form.
Based on the analysis of significant deviations of the torsion angle values observed in dimer steps of dodecamer d(CGCGAATTCGCG) 2 [10], it was assumed that the prime cause of high conformational flexibility of B-DNA can be the rotations around O3'P-O5'C5' (angle a) and C3'O3' -PO5' (angle z) .However, it was found that angle z significantly differs for dinucleotides steps (311 o ¸ 170 o ) while angle a changes within the range of g -conformation (319 o 280 o ) (Table 2) [64].
Various authors proposed the combinations of two angles, hypothetically specific for one of the double helix forms.Significant correlations were found for pairs c-d, c-z, d-z, e-z, e-b and z-b in B-DNA structure ( X-ray structural analysis of dodecamer d(CGCGAATTCGCG) 2 [64], and for pairs c-d, c-e, c-a, e-a, z-a, z-g, a-b and a-g in A-DNA structure (X-ray structure analysis of tetramer d(CCGG) 2 [53].
The values of c and d were found to correlate with sugar conformation and thus could be recommended as a criterion of A-or B-DNA [65].This was confirmed by the analysis of numerous crystal structures and the results of molecular dynamics simulation [66,67].Subtypes of B-and A-DNA forms.The results presented in the Table 2 show a slight difference between the values of angles b and g of sugar-phosphate backbone in A-and B-DNA, which remain within t-and g + -conformations respectively.Angleb is more variable (PO5` -C5`C4`), and its value correlates with values of angles e and z along the chain.Angle d associated with the sugar conformation, is the most variable, whereas the pair of angles e and z changes within the frame of two discrete states: (t, g -) or (g -, t).Due to the presence of two conformations in the pair e and z angles the concept of existing of two B-DNA subtypes, i.e.BI (e: z = t, g -) and BII (e: z = g -, t) was introduced.
Subtype BI corresponds to the classical B-form of DNA [68] and is characterized by sugar packing C2`-endo with angle d » 135 o , g -conformations of torsion angles z and a + 1 (the value of z »260 o is lower than in A-DNA) as well as by the values of c typical for B-DNA (» 260 o ) (Table 3).
The conformations of BI and BII are different by the position of phosphate groups relative to the grooves of DNA double helix.In BI subtype, a phosphate is situated almost symmetrically to both grooves, while in subtype BII it is turned towards the minor groove (Fig. 5, a, b).
Transition between the conformations of BI and BII takes place due to simultaneous change of two torsion angles e and z.Therefore, the subtypes of B-DNA can be described by the value of difference ez : » -90 o for BI and » + 90 o for BII [67].In BII-DNA torsion anglesc, a + 1 and b +1 are altered as well, however these changes usually compensate each other.

Table 2. Values of torsion angles for A-and B-DNAs
Fig. 4. Atom numbering system and torsion angle determination [43] combined steps YR and RY.On the other hand, the combined steps are preferable for conformation BII, in particular, for the step YR (Table 4).
Two or more BII conformers rarely follow one another, because BII conformation is accompanied by a local and global distortion of DNA structure, e.g.tetranucleotide, which possesses all four phosphate groups of the BII type, has a bend similar to that observed in complexes with proteins [71].That is, steps BII-BII require stabilization by external forces, such as crystal packing or interaction with proteins.
The phosphate group conformation, corresponding to BII subtype, affects the helical parameters of bases [71]: the roll values are only negative, twist has high values.
Mean x-displacement values are sensitive to the number of BII dinucleotide steps n the fragment with classic B-form structure.At BII steps ~ 20% x-displacement is positive, i.e. the bases are shifted towards the major groove, whereas for the B -form the mean value of x-displacement is equal to -1.5 A ° (see Table 1).The grooves of BII-DNA and BI-DNA only slightly differ by their width with the apparent tendency of opening major groove [68].
A-DNA also has two subtypes: AI, the classic A-form, and AII.For the first time an alternative conformation AII was observed for the central dinucleotide step C-G in the crystal structure of the duplex d(CCCCGGGG) 2 [72].The conformations of the angles a, b and g of the sugar-phosphate backbone of this step correspond to t-conformation, local twist is much smaller (25 o ) as compared to its value for the A-form (33.5 o ).
Mean values of all torsion angles for subtypes AI and AII are presented in Table 3.For different DNA fragments with the A-form double helix, relatively small spread of values is recorded only for three of seven sugar-phosphate backbone torsion anglesa, g and to a lesser extent, b.Therefore, the criteria for AI and AII subtypes are the values of torsion angles a (O3`P-O5`C5`) and g (O5`C5`-C4`C3`).Thus, two conformations of A-DNA differ in the orientation of sugars relative to the bonds P-O5`-C5`-C4` [38].
Subtype AI is characterized by g -/g + -conformation of torsion angles a/g, while for AII the values of a and g angles are within the t-region.This determines nearly planar arrangement of atoms O3`-P-O5`-C4`-C3` of the 3'-terminal nucleotides, easily accepted by purines and worse -by pyrimidines.Such values of a and g angles can be formed from classic A-form with a/g = 300 o /60 o by crankshaft-like rotation around the respective bonds, which effectively compensates switching of the torsion angles so that the general direction of backbone remains practically unchanged [73,74].That is, the AI ® AII transition is accompanied by a correlated change in torsion angles in the direction of P ® O5` ® C5` ® C4` that leads to a relative reorientation of desoxyriboses with the bases remaining almost unchanged (Fig. 5, c, d) [61].
Equilibrium AI -AII as well as BI -BII depends on the nucleotide composition and base sequence, i.e.AII subtype is more common for YR-steps and less frequent for steps RY and YY (Table 4) [38].
Nonclassical conformations of DNA sugar-phosphate backbone in the sites of protein binding.According to the data of numerous up to date studies on DNA fragment and protein-nucleic acid complex crystallography, the number of nucleotides with nonclassical conformations of sugar-phosphate backbone is essentially higher in the DNA -protein complexes than in the naked DNA.For example, the frequency of alter native conformations of the torsion angle g in DNA interacting with proteins is increased approximately by 5 times as compared to naked B-DNA [75,76].
Moreover, A-and B-forms as well as BI-and BII-conformations in naked DNA and DNA complexed with proteins differ not only in conformation of sugar-phosphate backbone but also in Table 3 Mean values of torsion angles for the subtypes of A-and B-DNAs [38] the values of helical parameters roll, twist and tilt (Table 5).Further we present several typical examples of DNA-protein complexes where nucleotides with nonclassical conformations of sugar-phosphate backbone can be used for DNA readout by proteins.
Transition of the central dinucleotide step C-G of duplex d(CCCCGGGG) 2 to an alternative conformation AII can be used as a signal for recognition by the protein of its "own" binding site, for example, by restrictase MSPI, which precisely cleaves the fragment CCGG independently on the nucleotide composition of neighbouring sequences [72].
Protein BPV-1 E2 recognizes DNA sequence (hexamer d(GACGTC) 2 ) by the bent toward the minor groove and by the BII conformation of the central C-G step [77].It was found [76] for the complex of insect heterodimer nuclear receptor consisting of ecdysone receptor (EcR) and ultraspiracle (USP) (PDB index 1R0O [78]) having palindrome duplex d(5`-AGGTCAATGACCT-3`) as a binding site, that adenine at the starting point of binding site had an A-like desoxyribose packing and t-conformation of torsion angle g (Fig. 6).For most of the similar complexes (the mammal steroid and non-steroid hormone receptors are bound to DNA just according to this principle), adenine at the start point has B-like desoxyribose packing and g + -conformation of angle g typical for the classic DNA B-type [79].Probably, adenine with alternative conformation can serve as a signal for the proteins involved in formation of this highly specific complex to recognize their binding sites.
Nonclassical conformations of the sugar-phosphate backbone are very common for the nucleosome structure (in particular, nucleosome 1KX5 [36]), where regular interchange of BI-and BII-conformations are observed.In the sites of direct protein-nucleic acid interactions these conformations are transformed to more deformed B-conformers characterized by switching of torsion angles a + 1 and g + 1 ("switched" BI-DNA) and by thepresence of nucleotides with a wide dispersion of the values of z and a + 1.In this case, it can be supposed that nucleotides with alternative conformations of sugar-phosphate backbone in nucleosome are essential not only for ensuring optimal conditions for interactions with histones but also for a possibility of other proteins to recognize indirectly their "own" sites on the nucleosome DNA.
Summarizing, the following conclusion could be made.The ability of DNA double helix, including sugar-phosphate backbone, to acquire alternative   conformations, can be considered as an "intrinsic" property of certain DNA sequences.Such conformational changes can result in the peculiarities of unique geometry of various dimer steps and/or individual nucleotides, thus leading to disturbances of the base stacking and/or alteration of the accessible surface area of atoms and therefore can be the criteria of protein-DNA recognition.Conformational transitions induce changes in some structure-dependent physical properties, i.e. polar/non-polar profile and electrostatic potential of both minor and major grooves, which can also be used by proteins for specific DNA readout [15,75].The latter appeared to be important for the Hox-group proteins functioning [80].However, further studies are required for understanding general principles of the indirect DNA readout by proteins.Table 5 Mean values of helical parameters A-and B-DNAs [69] and dinucleotide steps with BI-and/or BII-conformations [71] Fig. 6.A-like packing of desoxyribose and t-conformation of angle g in adenine at the starting point of protein binding site in a complex of heterodimer receptor with duplex d(5`-AGGTCAATGACCT-3`) (PBD index 1R0O) [78] á³ëêî

Fig. 3 .
Fig. 3. Two-stage transition of B-DNA into A-form under the changes of helical parameters (slide and roll): a -idealized B-DNA; b -transition into intermediate form as a result of mutual shifting of base pairs on 1.5 A ° (slide alteration); c -alternative intermediate form obtained by base pair rotation by 12 o (roll alteration); d -A-type obtained by simultaneous alteration of slide and roll (base pairs displacement and rotation)[56]

Fig. 5 .
Fig. 5. Diagram of two main conformations of B-DNA: a -subtypes BI and BI differ in orientation relative to the C3`O3`-O5`P-bond determined by torsion angles e and z[61]; b -in BI conformation (dotted line) angles e, z have (t, g -)-conformations; in BII subtype angles e, z have (g -, t)-conformations respectively [64]; csuperposition of A-DNA subtypes corresponding to correlated bond restructuring in sugar-phosphate backbone: subtypes AI and AII differ in orientation relative to the PO5`-C5`C4`-bond determined by the angles a, b, and g (d) [61].