The molecular mechanism of the spontaneous substitution mutations caused by tautomerism of bases : Post Hartree-Fock study of the DNA rare base pairs

Gas-phase gradient optimization of the DNA rare base pairs containing lactam-lactim and amino-imino tautomers was carried oat using the Hartree-Fock (HF), Density Functional Theory (DFT) and the second-order Moller-Plesset perturbation (MP2) methods at the 6-31G(d, p) basis set. It is shown that full geometry optimization at the MP2 level leads to an intrinsically nonplanar propeller-twisted and buckled geometry of G*-T and G-T* base pairs. The nonplanarity of the pairs is caused by pyramidalization of the amino nitrogen atoms, which is underestimated by the HF and DFT methods. This justifies the importance of geometry optimization at the MP2 level for obtaining reliable prediction of the charge distribution, molecular dipole moments and geometrical structure of the base pairs. The comparison of the formation energies for the rare base pairs shows the energetical preference of the G*-T and A-C* base pairs as compared with the G-T* and A*-C ones, respectively. It is detected that the stabilization energies of the G-T* and A*-C base pairs describing the interaction between monomers are essentially larger than those of the G*-T and A-C* base pairs, respectively. An analysis of the decomposition members for molecular HF interaction energies by Morokuma-Kitaura (MK) and the Reduced Variational Space (RVS) methods showed that the nature of a larger stability of the G-T* and A*-C base pairs as compared to the G*-T and A-C* ones is due to the electrostatic interactions by 60—65 % and the polarization and charge transfer interactions by 35—40 %.

Introduction.One of the possible molecular mecha nisms for the formation of spontaneous mutations is conditioned by the tautomerism of DNA bases [ 1 ].The tautomerism of bases can play an important role in the formation of the Watson-Crick-like mismatched base pair.There are two ways for the formation of the rare base pairs from the tautomeric forms of DNA bases: on the template level (replication errors) and on the substrate level (insertion errors).If the template residue or the incoming substrate (nuc leoside triphosphate) is in the wrong tautomeric form when DNA undergoes semi-conservative replication, then incorrect base may be inserted.In case of the absence of correction for this error before the next replication cycle, the two daughter duplexes will have different base pairs at the position of the original mispair.
When forming a DNA double helix, guanine (G) forms a H-bonded pair with cytosine (C).On the other hand, the rare lactim form of guanine (G*) forms a pair with thymine (T) instead of C. Similarly, the rare imino form of cytosine (C*) pairs with adenine (A) instead of G.After the strand separation, the counterbases will form pairs with A and T instead of G and C, respectively.
Thus, the scheme postulated in ref. [1 ], leads to a spontaneous G-C -* A-T transition in subsequent rounds of replication, unless it is detected by the methyl-directed mismatch DNA repair system.In its turn A forms a hydrogen-bonded pair with T. Rare imino tautomer of adenine (A*) forms pair with C instead of T. The rare lactim form of thymine (T*) pairs with G instead of A. After the strand separation, the counterbases pair with G and C instead of A and T, respectively.As a result it leads to a spontaneous A-T -» G-C transition.In these cases the frequency of mutations is governed by the concentration of tem plate bases on DNA or free substrate nucleoside triphosphates in their minor tautomeric forms in solution.
Therefore spontaneous point mutations can arise from errors during semi-conservative replication.This, however, is very rare due to the exonuclease functions as a proofreading mechanism recognizing mismatched base pairs and excising them, but they can escape at some frequency.
The tautomeric mechanism that does not require the presence of the free rare tautomers in solution may also occur.In this case the rare tautomers are formed in the template via the simultaneous transfer of two protons in the H-bonds of the DNA base pairs.Based upon the Watson-Crick model of DNA, Lowdin [2] pointed out that there was a certain intrinsic probability of proton movement in DNA.Namely the protons in two H-bonds between paired bases change their position in time from the most favourable position to the next most favourable position.This spontaneous shift of the positions, which is charac teristic and inherent to a quantum mechanical par ticle, transforms both bases to their tautomeric forms.The tautomeric form can make a pair only with a base different from the normal partner.
This would cause an error in the genetic in formation, and the accumulated errors of this kind could be responsible for mutation, aging and spon taneous tumors.
It should be noted that this modified tautomeric mechanism equally with usual tautomerism assumes that these tautomers will remain stable during DNA unwinding and strand separation, which are the prerequisite steps for the synthesis of new DNA strand by polymerase.However, there has never been any convincing evidence demonstrating that rare tau tomeric forms of the bases are responsible for the spontaneous mutagenesis.Recently Goodman sug gested [3] that perhaps the rare base pairs exist in the polymerase active site and later shift to the ionized, protonated and wooble base pair structures observed by NMR and X-ray crystallography.
Therefore, the ability to identify the rare base pairs is not only interesting in itself but also may open the way to investigate their properties in relation to ionized and wooble base pairs.Moreover, ir respective of that the rare base pairs is direct or indirect reason of spontaneous mutations.
Very little information on the quantum chemical study of the rare base pairs containing lactam-lactim and amino-imino tautomers has been obtained up to now.Some geometrical and electronic properties as well as interaction energies disregarding the basis set superposition error (BSSE), obtained at the HF/STO-3G and HF/4-31G levels, were presented in the work [4].During the geometry optimization of the base pairs, only the distances between the bases and their mutual orientations were optimized whereas the coplanarity of base pairs was maintained and the base geometries obtained by the HF/STO-3G method were kept rigid.
The interaction energies of the bases in the G*-T and A-C* base pairs calculated by B3LYP/6-31++G(d, p) method were presented in other works [4,5].At the same time for the other two rare base pairs (G-T* and A*-C) such data in [5] are absent.Besides, the authors of these works [4,5] proposed that the interaction energy of the bases in the base pairs is the index of the energetic preference of their formation.
As a result the geometrical structure of DNA rare base pairs, energetic aspects of formation and phy sical picture of their H-bonded pairing with the standard basis set are not yet determined.Moreover, the influence of electron correlation on different properties of the rare base pairs remains obscure.In particular, the detailed analysis of nucleic bases interactions at the HF and post HF levels of the theory is extremely important.
This work presents the results of HF, DFT and MP2 ab initio studies at 6-31G(d, p) level of a number of properties of the rare base pairs (dipole DANILOV V. !., HOVORUN D. I., NORIYUKI KUR1TA moments, optimal geometries and interaction ener gies) that have not been previously studied at these levels of the quantum-mechanical theory.
Methods.In order to elucidate the above-men tioned questions, we carried out a study of the G*-T, G-T*, A*-C and A-C* rare base pairs at the HF level, DFT level with functionals B3LYP and B3PW91 and MP2 ab initio level to find stationary points on the potential energy surfaces.In addition, the single point calculations have been performed for the studied systems at the MP2/6-311++G(d, p)//MP2/6-31G(d, p) level of the theory.
The correlated calculations are performed within the frozen core approximation [6].The standard split-valence 6-31G basis sets augmented by a set of Cartesian d-polarization functions placed on heavy atoms and p-polarization functions assigned to hyd rogen atoms [6 ] were used for geometry optimization of the rare base pairs at the HF, DFT and MP2 levels of the theory.
The MP2 geometry optimization was started from the HF/6-31G(d, p) optimized geometry of rare base pairs [7 ].Geometry optimization has been continued until the largest component of gradient is smaller than 0.00003 Hartree/Bohr and the root means square gradient is less than 1/3 of the maximal gradient component.Since the intermolecular interaction ener gy calculated in a finite basis set is a subject to a BSSE, the calculated energy term should be appro priately corrected.The counterpoise corrected MP2 formation energy for the H-bonded AB complex, E?, from the usual form of the free base and the rare tautomeric form of the free base is given by [7,8 ] where is tautomerization energy; is the deformation energy describing the effects of the geometry relaxation of subsystems A and B in the dimer; A AB (AB_MP2) -£ MP2 " AB (AB_MP2) (lc) is the MP2 stabilization energy of the base pair describing the interaction between monomers.
According to the Moller-Plesset perturbation the ory [9] where is the HF interaction energy between bases; is the correlation interaction energy within the fra mework of the second-order Moller-Plesset pertur bation theory.
In the above-mentioned expressions the following designations are used: £' Y z is energy of a system X computed by the Y method with basis set Z. The (AB_MP2) symbols indicate the geometry of complex AB optimized by the MP2 method.
The £f term was corrected by the conventional counterpoise correction method, which eliminates the BSSE.The counterpoise corrected DFT formation energy for the H-bonded AB complex is determined by analogy with the MP2 formation energy.The E? term in our analysis includes deformation energy £ J,EF because monomers change their geometry upon for mation of the complex.The £ DEF was calculated as the difference between the energies of the bases in the optimized dimer geometry and the optimized isolated bases.
It should be noted that in the expressions (1), ( 2) and (2a) the energy terms of the Hartree-Fock interaction energy for the base pairs were calculated on the MP2-optimized geometry since the Hartree-Fock solution appears as the zero-order approxi mation in the MP2 method (see [9]).
For the elucidation of the nature of the hydrogen bonding and stability of the rare base pairs the Morokuma-Kitaura (MK) [10] and the Reduced Va riational Space (RVS) [11] methods of the decom position for the molecular HF interaction energies (£" F ) were used.Different energy contributions de termining hydrogen bonding in the rare base pairs, namely the electrostatic energy, E™, exchange repul sion energy, polarization energy, E FL , charge transfer energy, E 171 , and the higher order coupling term, were evaluated by these methods.Since the intermolecular interaction energy cal culated in a finite basis set is a subject to a basis set BSSE, the calculated energy terms should be appro- priately corrected.The £" F , £ XR , £* :T , £ MIX energies, the correlation interaction energy, £ C0R and total complex formation energy, E T were corrected by the conventional counterpoise correction method, which eliminates the BSSE.It should be noted that the BSSE correction is partially generated by the RVS energy decomposition scheme for the £ XR , £f T and £^, x terms (see [12]).
Results and Discussions.Table 1 summarizes bond lengths and bond angles involved in hydrogen bonds between the bases of the G*-T, G-T*, A*-C and A-C* rare base pairs as well as dipole moments based on the HF/6-31G(d, p) and MP2/6-31G(d, p) geometry optimization calculations.
As seen from Table 1 the dipole moments pre dicted by the MP2 methods are noticeably different from those predicted at the HF level.It agrees with the known fact that the HF approximation overes timates dipole moments.According to the MP2/6-31G(d, p) results, the inclusion of electron correlation during the geometry optimization reduces the dipole moment of the G*-T and G-T* base pairs by 10 % and 11 %, respectively, in comparison with the dipole moment values calculated by the HF method.A larger reduction of the dipole moment is predicted by the MP2 method for the A*-C and A-C* base pairs.This reduction constitutes 27 % and 21 %, respectively.
The data obtained show that the neglect of electron correlation certainly distorts the molecular structures of the rare base pairs.Electron correlation brings both subsystems closer to each other.This can be seen from the decrease in the distances between the atoms H and Y in X-H...Y intermolecular Hbonds of the base pairs.At the same time according to the MP2/6-31G(d, p) calculations the distance between the atoms X and Y in the intermolecular H-bond of the G*-T and G-T* rare base pairs for the studied basis set reduces by 0.10-0.14A and 0.10-*A value equal to 360° corresponds to a planar amino group.Decline from the planar state serves as a piramidalizaton level.0.11 A, respectively, in comparison with the HF optimized geometry.Analogous decreases in the A*-C and A-C* base pairs constitute 0.02-0.05A and 0.14-0.15 A. This is rather a significant shortening caused by the dispersion attraction, which is taken into account by the MP2 method.
It is interesting to note that the bond lengths of the same H-bonds of the stereoisomer mispairs con taining rare tautomers of bases are noticeably dif ferent.Especially it concerns 0-H...0 hydrogen bond of the G-T* base pair and N-H...N hydrogen bond of the A*-C base pair that are shorter by 0.2 A and 0.1 A, respectively.
The geometric data characterizing the amino groups of bases for the rare base pairs optimized by the post HF methods are given in Table 2 (the numbering system corresponds to the IUPAC recom mendations on Nomenclature of Organic Chemistry).It can be seen that according to the MP2 calculations the amino group of guanine in the G*-T and G-T* base pairs is essentially nonptanar.At the same time the amino group of cytosine in the A*-C and adenine in the A-C* base pairs is planar.
The analysis of the optimized molecular structure for the G*-T and G-T* base pairs shows that the amino group hydrogens of guanine deviate from the base plane in the direction, opposite to the shift of the amino group nitrogen atom (the sp 3 -hybridized struc-

Table 3 The geometrical and energetical (kcallmol) characteristics of hydrogen-bonded rare base pairs, obtained at the DFT(B3LYP/6-3JG(d, p)), DFT(BPW91l6-31G(d, p)), MP2Il6-31G(d, p) and MP2/6-311++G(d, p)//MP2/6-3JG(d, p)-optimized geometries
ture of the nitrogen atom valence shell of an NH2group or the partial pyramidalization of the amino group of the DNA base).Examination of the base pairs structure suggests that the source of the nonplanarity is a geometrical peculiarity of the NH2 group of guanine.
At the same time it should be noted that the DFT approach suggests a very weak nonplanarity of the amino groups of bases (see Table 2).As a result of underestimation of the pyramidalization effect, the DFT method leads to almost planar or perfect planar structure for the rare (see Table 2) and canonical base pairs (see, for example, [7]).It should be noted that our results on the geometry optimization of the rare base pairs by the HF/4-31G method without any geometrical constraint (compare to ref. [4 ]) as well as HF/6-31G(d, p) led to the planar structures.
Meanwhile the question about the nonplanarity of the nucleic acid bases and base pairs may have important consequences for the realization of the particular structures of these compounds in various molecular complexes.The potential biological im portance of interactions involving nonplanar amino groups of, bases was repeatedly stressed by Sponer et al. in the corresponding reviews [14,15].The data characterizing the molecular structure of the rare base pairs that have been obtained in our calculations is of interest.The geometry optimization conducted at the DFT and MP2 levels leads to an intrinsically nonplanar canonical G*-T and G-T* base pairs (see Table 3).In other words, the bases in a base pair are not coplanar, instead they are twisted about the hydrogen bonds that connect them.
As it is well known the orientation of individual bases within each base pair can be described in terms of propeller twist (PT) and buckle parameters [16] which characterize the rotation of bases around the long axis of a base pair and the inclination of mean planes of bases with respect to each other.In fact, the PT angle is the dihedral angle that defines the noncoplanarity, and buckle angle is the dihedral angle between bases along their short axis.PT and buckle angles are secondary parameters, which simply des cribe the imperfections, i. e. nonplanarity, of a given base pair.
Especially noticeable are the angular charac teristics of two base pairs (magnitudes of propeller and buckle angles) obtained by the MP2 method (see Figure , a and b).So the propeller angles between the base planes consist of 10° and 5° for the G*-T and G-T* base pairs, respectively, whereas the buckle angles are 7° and 2°.Supposedly, the main reason of nonplanarity of the G*-T and G-T* base pairs is pyramidalization of the amino group of guanine.The

Structures of the rare base pairs determined by MP2/6-31G(d, p) method: a -G*-T; b -G-T*
conventional planar structure of bases, as would be expected, gives only the coplanar type of H-pairing.Among other possible distortion factors of nonplanarity of the pairs are a wide variety of secondary long-range electrostatic interactions, involving the hydrogen atoms bonded to ring carbon atoms, and steric reasons (see more information [14,15]).The refore the nonplanarity of isolated DNA rare base pairs as well as that of the isolated Watson-Crick ones (see [7]) is their intrinsic property.The propeller base twisting and buckling in isolated rare base pairs has been obtained without any attraction of the stacking perturbation hypothesis that was marked earlier [7,17].
According to the geometric selection mechanism of bases as a principal determinant of DNA rep lication fidelity [17][18][19][20] the geometrical and electro static properties of the polymerase active site are likely to have a profound influence on nucleotideinsertion specificities.This influence would strongly favor the insertion of bases having an optimal geo metry, such that the C1'(N9)-C1'(N1) distances and bond angles most closely approximate those of the Watson-Crick base pairs.The detailed study of the obtained geometric characteristics for the optimized rare and Watson-Crick base pairs showed the following.The distance between the bonds joining the bases to the deoxyribose groups in the G*-T and G-T* rare base pairs is close to the corresponding canonical distance in the G-C base pair, while this distance in the A*-C and A-C* base pairs is close to that in the A-T base pair.Moreover, in each pair of stereoisomers the Cl'-N9 and Cl'-Nl bonds make an angle with Cl'(9)-Cr(l) line that is close to the corresponding values in one of the Watson-Crick canonical base pairs.The analogous conclusion made earlier Topal and Fresco [21 ] who studied each of the abovementioned rare base pairs by the model building and showed that these pairs were sterically compatible with a Watson-Crick helix.Therefore the formation of the DNA rare base pairs with such geometry is compatible with the geometric constraints of the standard double helical DNA.If these mispairs were to be incorporated in a standard Watson-Crick double helix, the helix would not likely be highly distorted and its stability significantly did not reduce.
At the same time it should be noted that the experimentally detected the G-T, G-A, and C-A mispairs have markedly different CI'(9)-CI'(1) dis tances and glycosyl bond angles than those of the A-T and G-C pairs.The striking geometric identity of the Watson-Crick A-T and G-C base pairs is not matched by the A-C protonated wobble and G-T wobble base mispairs or by the G(anti)-A(syn) base mispair.Therefore the geometric constraints imposed on the substrate and template bases at the polymerase active site are not the only ones for incorporation of the non-Watson-Crick base pairs in DNA.
In the Table 3 the values of the tautomerization energy £: TAU (see (la)), deformation energy £P ET (see (lb)), stabilization energy £ JNT (see (lc)), formation energy I? (see (1)) are also given.The comparison of the formation energies of the canonical [7] and rare base pairs (see Table 3) shows that the formation of the Watson-Crick G-C base pair is most preferable among all studied base pairs.At the same time the formation of the G*-T base pair is more preferable than the A-T one.The A-C* and G-T* base pairs are only slightly energetically less preferable than the A-T base pair.The direct comparison of the computed energies for two pairs of stereoisomers also shows the energetic preference of the G*-T and A-C* base pairs towards the G-T* and A*-C ones, respectively.Our results led to the interesting fact that the stability of the G-T* base pair is much larger than that of the canonical Watson-Crick G-C base pair (see [7]) which is considered at present as the largest among all studied base pairs.It is also seen from Table 3 that the stabilization energy of the G-T* base pair is essentially larger than that of the G*-T, irrespective of computation me thods.However, for the formation of the G-T* rare base pair the usual form of T must be replaced by rare tautomeric form of T* that requires the energy

Note. ED -energy decomposition scheme; MK -Morokuma-Kitaura energy decomposition scheme; RVS -reduced variational space energy decomposition scheme, complex formation energy disregarding tautomerization energy E T -E T -E TAV .
consumption equal to 11.92-13.35kcal/mol.Further more, the formation of such base pair is accompanied by expense of 6.13-8.09kcal/mol of the deformation energy describing the effects of the geometry rela xation of its G and T* bases.As a result, the formation of the G-T* base pair becomes energetically less preferable than that of the G*-T one.Table 3 also shows that the stabilization energy of the A*-C base pair is essentially larger than that of the A-C*.On the other hand, for the formation of the A*-C base pair the usual form of A must be replaced by rare tautomeric form of A* that requires the energy consumption equal to 12.06-13.72kcal/mol.More over, for the geometry relaxation of A* and C bases in this base pair the expenses of 2.68-3.62kcal/mol are required.So the formation of the A*-C base pair is energetically less preferable than that the A-C* one.
Therefore, the obtained data directly show that the calculated interaction energies of bases in the rare base pairs are insufficient in order to characterize the relative ease or difficulty of incorporating base pairs into a double helix.In particular, it was done in the work [4,5].An appreciably larger stability of the G-T* and A*-C base pairs is of particular interest in view of large similarity of their molecular structure to the corresponding stereoisomers.For understanding this result, the MK and RVS analysis of molecular interaction energy components for the DNA rare base pairs was carried out.The calculated values of these components are given in Table 4.The analysis of the decompositions terms showed that the nature of a larger stability of the G*-T and A*-C base pairs, as compared to those of the G-T* and A-C* ones, respectively, by 60-65 % is due to the electrostatic interactions.At the same time polarization and charge transfer interactions also make considerable con tribution (by 35-40 %) to a larger stability of the above-mentioned base pairs.It should be emphasized that according to the Table 4 the correlation in teraction makes a noticeably larger contribution to the stability of the G*-T and A-C* base pairs than to that of the G-T* and A*-C ones.Therefore, a larger stability of the G-T* and A*-C base pairs is not related to the correlation interaction.
As a result of our calculations, the following conclusion can be done.According to the energetical point of view, the formation of the G*-T and A-C* base pairs must lead to the spontaneous mutations more often in comparison with the G-T* and A*-C ones, respectively.The analysis of possible transitions during the replication and insertion errors and the obtained formation energies of the base pairs show that energetically the most probable transitions are those for which tautomeric transformations occur in the G.The transitions for which tautomeric trans formations occur in the C and T are less probable.In other words, in the case of lactam-lactim and aminoimino tautomerism the replication errors mostly lead to the transitions G-C -» A-T, whereas the insertion errors mostly lead to the transitions A-T -* G-C.The transitions for which tautomeric transformations occur in the T are even less probable.The least probable transitions are those conditioned by tautomeric trans formations in the A.
The obtained data might testify in favour of the possibility of origin of the spontaneous mutations due to the tautomerism of bases.Unfortunately, the origin of the spontaneous mutations is conditioned not only by energy factors, but also by other reasons.Among them are such important factors as the entropy and the possible geometric selection mechanism in the exonuclease active site, enhancing the excision of non-Watson-Crick base pairs.So, the discussed prob lem is in need of the further investigation.