The study of the canonical Watson-Crick DNA base pairs by Moller-Plesset perturbation method: the nature of their stability

Gas-phase gradient optimization was carried out on the canonical Watson-Crick DNA base pairs using the second-order Moller-Plesset (MP2) perturbation method at the 6-31G* and 6-3 JG*(0.25) basis sets. It is detected that full geometry optimization at the MP2 level leads to an intrinsically nonplanar propellertwisted and buckled geometry of G-C and A-T base pairs. Morokuma-Kitaura (MK) and reduced variational space (RVS) methods of the decomposition for molecular Hartree-Fock interaction energies were used for the investigation of the hydrogen bonding in the Watson-Crick base pairs in Question. It is shown that the stability of the hydrogen-bonded DNA base pairs originates mainly from electrostatic interactions. At the same time the polarization, charge transfer and dispersion interactions also make considerable contribution to the attraction energy of bases.

Introduction.Energetic aspects and molecular struc ture of hydrogen bonding and base stacking of nucleic acids bases were extensively studied by Sponer et al. (see reviews [1,2]) at the different levels of quantum mechanical theory.
According to the quantum mechanical calculations the stabilization of the base pairs is dominated by the Hartree-Fock (HF) interaction energy and the sta cked complexes of the DNA bases are mostly stabi lized due to the London dispersion interaction, which is an effect of correlation of electron motions.The main achievement of these calculations was an impro vement of our knowledge about the structure and electronic properties of nucleic acids bases and base pairs.These calculations basically provided the ulti mate picture of the origin and magnitude of inter actions of DNA bases in the gas-phase.
Although a large progress has been achieved in study of nucleic acid bases and their H-bonded and stacked complexes, the origin of the binding energy geometries optimized at the HF level.The electron correlation is also important to obtain accurate charge distribution and molecular dipole moments of DNA bases.At last, studying an influence of the optimized geometries on the stabilization energies of the base pairs due to the electron correlation is especially important since the London dispersion attraction is a function of molecular geometry (induced dipole-dipole interactions in the classical picture).
On the whole, a large computational work on the study of these effects for the canonical Watson-Crick nucleic acids base pairs has been conducted by DFT method.It should be noted, however, that the study of the bases by DFT method with different func tional shows that the amino groups should be nonplanar in order to obtain true minima, L e. this method predicts an incorrect geometry for amino groups of the bases.At best, the DFT approach suggests a weak nonplanarity of the amino groups of the bases while the MP2 quantum chemical calcu lations indicate a rather strong amino group pyramidalization [1,2].Therefore the DFT approach is not able to reproduce the amount of pyramidalization and the order of NH2 nonplanarity.As a result, the DFT method leads to the planar structure for the base pairs [5][6][7][8][9][10].Meanwhile the question about the planarity of the nucleic acid bases may have impor tant consequences for the structure of DNA and molecular recognition processes.
At the same time the results of the detailed study of the base pairs by MP2 method with geometry optimization are absent.Only limited information of quantum-chemical studies for these base pairs at the MP2/6-31G** and MP2/6-31G* levels was published in the literature [11][12][13][14] and therefore the influence of electron correlation effects on the different pro perties of the base pairs remains still obscure.
Thus, despite the significant success achieved in studying the association between bases and their derivatives, the exact physical picture of H-bonded pairing for the Watson-Crick base pairs as well as their geometrical structure are not yet understood in full.Therefore the detailed analysis of nucleic bases interactions at the post-HF correlated levels of the theory is extremely important.
Methods.In order to elucidate the above-men tioned questions, we carried out a study of canonical Watson-Crick A-T and G-C base pairs at the HF and MP2 ab initio levels to find stationary points on the potential energy surfaces.
In this paper the study of some properties for these base pairs (interaction energies, relative roles of various energy contributions and optimal geometries) using MP2 nonempirical ab initio technique [15] is presented.The correlated calculations are done within the frozen core approximation [16].
The standard split-valence 6-31G basis set aug mented by a set of Cartesian d-polarization functions on heavy atoms [16] and the standard split-valence 6-31G basis set augmented by a set of diffuse d-polarization functions with an exponent of 0.25 added to the second-row elements (designated 6-31G* and 6-31G* (0.25), respectively) were used for geometry optimization at the HF and MP2 levels.
MP2 geometry optimization was performed star ting from HF/6-31G* optimized geometry of A-T and G-C base pairs.The geometry optimization has been continued until the largest component of the gradient is smaller than 0.00003 Hartree/Bohr and the root means square gradient is less than 1/3 of the maximal permitted gradient component.All ab initio calculations were performed using the GAMESS US set of programs ( [17], Granovsky A. A, www http:/ / classic.chem.msu.su/gran/gamess/index.htm).
Different energy contributions determining hyd rogen bonding in the canonical Watson-Crick base pairs, namely the electrostatic energy, E ES , exchange repulsion energy, E**, polarization energy, E PL , char ge transfer energy, E CT , a higher order coupling term among various interaction components, E MIX , were evaluated by MK [18] and RVS (see [19]) methods for decomposition of the HF interaction energy, E HF .
The E T term includes deformation energy, E DEF because stability of the base pair is influenced by deformation of the monomers upon formation of the complex.The E DEF was calculated as the difference between the energies of the bases in the optimized dimer geometry and the optimized isolated bases.
The intermolecular interaction energy calculated in a finite basis set is a subject to the basis set superposition error (BSSE).The E HF , E XJ \ E CT , E MIX , the correlation interaction energy, E COR and total complex formation energy, E T were corrected by a conventional counterpoise correction method, which eliminates BSSE.It should be noted that the BSSE
correction is partially generated by the RVS energy decomposition scheme since only the E XR , E CT and E MIX terms are corrected for BSSE.
Calculation of the energy terms of the HF interaction energy for the base pairs in question was performed on the geometry optimized at MP2 level with the 6-31G* and 6-31G* (0.25) basis sets since the HF solution appears as the zero-order approxi mation in the MP2 method.The calculated values of such energy components are also given in table .Results and Discussion.Table represents the components of molecular interaction energy for cano nical Watson-Crick DNA base pairs.The analysis of the attraction energies shows that the electrostatic forces are the main factor of attrac tive intermolecular interaction in the A-T and G-C base pairs with the 6-31G* and 6-31G*(0.25)basis sets irrespective of the energy decomposition scheme.They provide 62-66 % and 67-71 % of all attractive interactions, respectively.At the same time the polarization and charge transfer energies also make considerable contribution in the attraction ener gy of bases.These terms constitute 26-31 % for the A-T and G-C base pairs.
At the same time it should be noted that the use of the 6-31G* (0.25) basis set in the MK analysis at the HF and MP2 optimized geometry identifies the dominant term E CT and E MIX , respectively (data not shown here).This result does not look trustworthy.It is related to the fact that in the MK scheme all wavefunctions violate the Pauli exclusion principle (with the exception of the wavefunction describing the exchange repulsion energy).At the same time the RVS analysis is more robust, due to inclusion of Pauli repulsion.This is the reason why we did not include in table the results of the MK analysis obtained with help of the 6-31G*(0.25)basis set.
It can be seen from table that the correlation interaction makes small contribution to the stabiliza tion of the A-T and G-C base pairs (7 and 2 %, respectively).It should be emphasized that the elec tron correlation contribution to the interaction energy is not identical to the dispersion term.It is related to the fact that the correlation interaction energy in the MP2 method contains not only intersystem component corresponding to the dispersion energy but also the second order intrasystem correlation correction to the electrostatic energy (E 12 ) and a series of other inessential coupling terms [20 ].The actual dispersion energy most likely is larger, since the intrasystem correlation interaction component is a repulsive term caused by a reduction of the electrostatic interaction due to the correlation of electron motions.
Indeed, our approximation of the second order intrasystem correlation correction by the multipole expansion through the quadrupole-quadrupole term showed that its magnitude is +3.4 for the A-T base pair and +6,9 kcal/mol for the G-C base pairs.As a result the contribution of the E ES term to all attractive interactions for the A-T and G-C base pairs are decreased by 9-12 % on suggestion that the dis persion term is invariable.However this suggestion is improbable.Most likely the dispersion energy is increased by the value equal to the difference between E C0R and E 12 that leads to the dispersion term comparable to the E PL and E CT terms.In this case the attractive contribution of the electrostatic energy will consist of only 50 % of all attractive interactions.
Nevertheless the data of table show that the correlation interaction energy for canonical Watson-Crick DNA base pairs is significantly smaller by absolute value than the London dispersion energy found with the empirical formulas (see [21]).The fact that the empirical London dispersion energy does not reproduce the correlation interaction energy has been already emphasized by Sponer et al. [22 ].
Thus, the electrostatic interaction provides maxi mum 62-66 % and 67-71 % of all attractive interactions for the A-T and G-C base pairs, respec tively.At the same time, the considerable contribution in the stability of the considered systems is provided by the rest of attractive interactions (minimum 34-38 % for the A-T base pairs and 29-33 % for the G-C base pairs).Indeed, the polarization, charge transfer and dispersion interactions provided between the DNA bases in the Watson-Crick pairs are of the strength comparable with electrostatic interactions.Therefore, the hydrogen bonds in DNA base pairs are not a pure electrostatic phenomenon.
Our calculations on the gradient optimization at the MP2 level for the both basis sets lead to an intrinsically nonplanar canonical G-C base pair with the propeller-twisted optimal geometry: the angles between the base planes are 8° and 6° for the 6-31G* and 6-31G*(0.25)basis sets, respectively, whereas the buckle angles are 4° and 3°.Supposedly the main reason of nonplanarity of G-C base pair is the pyramidalization of the amino group of guanine.
Hessian calculations were performed on the op timized geometry for canonical Watson-Crick base pairs at MP2/6-31G* and MP2/6-31G*(0.25)levels to verify a nature of the stationary point obtained.The calculations of the force constant matrix for these MP2/6-31G* and MP2/6-31G*(0.25)wavefunctions were carried out by the numerical (finite-difference) method.
Inspection of the harmonic frequencies for normal vibrations of nonplanar G-C base pair shows that it possesses only real vibrational wavenumbers.The calculated intermolecular vibrational frequencies in the canonical Watson-Crick G-C base pair (38.5 and 34.6 cm" 1 for the 6-31G* and 6-31G*(0.25)basis sets, respectively) correspond to the propeller-twist mo tions.Frequency analysis of G-C pair has produced six positive trivial modes with the largest value smaller than 20 cm" 1 that is within accuracy of the numerical calculation of Hessian.This confirms that the found optimized nonplanar structure represents a local minimum on the MP2/6-31G* and MP2/6-31G*(0.25)potential energy surfaces of G-C base pair.The gradient optimization of the G-C base pair was also performed within Cs symmetry by MP2/6-31G* and MP2/6-31G*(0.25)methods.The perfor med harmonic vibrational analysis for the planar G-C base pair with both basis sets showed one imaginary vibration mode that gave evidence of its transition state nature.
The comparison of the energies for the optimized nonplanar and optimized planar G-C base pairs shows that the energy improvement due to the nonplanarity is rather small and amounts to 0.22 and 0.24 kcal/mol for the 6-31G*(0.25)and 6-31G* basis sets, res pectively.
By contrast with the G-C base pair, the geometry optimization of the canonical A-T base pair at the MP2 level for the both basis sets with the default value of gradient in the program leads to a planar structure.The flat geometry of the A-T pair gave a rise to one imaginary mode with a value of 30 cm" 1 indicating a transition state character of the structure.This mode corresponds to out-of-plane vibrations of nitrogen atom of the adenine amino group.A geo metry perturbation from the transition state has been performed along the imaginary mode.The geometry optimization has resulted in a lower energy minimum found (by 0.023 kcal/mol).This geometry corres ponds to small propeller-twist orientation of Ade and Thy with amino group of Ade being slightly non planar.Following Hessian calculation has confirmed energy minimum character of the structure.
It should be noted that very recently with help of the SCC-DFTB-D theoretical-experimental method the authors of the work [23] have detected that isolated N-methylated Watson-Crick A-T base pair has the propeller-twist angle equal to 4° whereas sum of amino group valence angles is equal to 360°.
Thus, calculations conducted on the geometry optimization of the canonical Watson-Crick base pairs have shown that they are extremely flexible with respect to the out-of-plane deformations that explains the origin of their propeller-twisted structure.Hence nonplanarity of these base pairs in the gas-phase is their inherent feature.This fact was detected first by us for the G-C base pair [13,14].
It should be noted that some energy improvement obtained due to nonplanarity of the A-T and G-C base pairs is very small.In the real DNA double strands, the effect of temperature may be rather large compared with the 0.24 kcal/mol, so that base pairs may vibrate around the nonplanar structure.That provides a larger energy advantage as compared with the planar structure.However, the formation of the nonplanar structure for the base pairs induces imme diately some important effects.In particular the amino group hydrogens can participate in out-of-plane H-bonds, where the hydrogens are bent away from the molecular plane of bases.Besides, the amino group nitrogen atom can serve as a weak H-acceptor because of the partial sp 3 hybridization of the amino group nitrogen atoms.In other words, between the conformational flexibility of a dinucleotide step and the level of propeller in the base pairs exists a striking correlation.The potential biological importance of the interactions involving nonplanar amino groups of bases was repeatedly stressed by Sponer et al. [1,2].
It should be noted that the optimizations carried out at the HF/6-31G* level without geometry con straints (see [24]) showed that the optimized struc tures for methyl derivatives of the Watson-Crick base pairs were perfectly planar.In the opinion of Gould and Kollman [24 ] it is not surprising since the planar amino groups of the bases in standard Watson-Crick pairs give the greatest overlap between orbitals invol ved in hydrogen bonding.
The data of Sponer et al. [22] obtained by HF/6-31G** method confirm that the canonical base pairs are planar.According to these data the hydro gen bonding between bases efficiently eliminates pyramidalization of the amino groups of the DNA bases.In other words, the amino group becomes planarized as a result of the formation of H-bonds.Up to now Sponer et al. consider that the electronic structure of the amino group in the base pairs is changed to the anticipated sp 2 arrangement due to strong primary in-plane H-bonding (see very recent paper [25]) in spite of the absence of detailed data on the MP2 calculations for the canonical base pairs.
The HF calculations with 6-31G* and 6-31G* (0.25) basis sets performed by us also produce planar structure of the complexes and show that H-bonding is not the sole reason of the planarization.Therefore the HF/6-31G** method does not reproduce non planar geometry for these base pairs.Meanwhile our study of the canonical planar base pairs by MP2 method shows that the amino groups should be nonplanar in order to obtain true minima.As a result the base pairs become nonplanar since the pyramidalization of the amino groups is one of the most prominent sources of their nonplanarity.
At the same time according to the Watson-Crick model of helix B-DNA the ideal base pairs are co-planar, i. e. they appear to be on the same plane.Actually the Watson-Crick base pairs in nucleic acid structures are not really co-planar.In real DNA structures (single crystal structure of DNA oligomers) there is some twist in one base relative to the opposite base within a base pair [26 ].The AT pairs can show a higher degree of propeller-twisting than the G-C pairs: angle between the planes of two paired bases constitutes 12° for A-T and 7° for G-C base pairs.The propeller-twist in the base pairs makes stacking into a dinucleotide step more awkward than in plane base pairs, enhances the stacking of the bases on the strand and increases the stability of the helix.
It should be noted that experimental data based on X-ray crystallography of DNA oligomers give evidence that base pairs are nonplanar owing to the propeller-twisting.Therefore in order to change the base pairs planar structure for the nonplanar one it is necessary to expend the energy.At the same time in accordance with our computations even the isolated base pairs have the propeller-twisted geometry.
The results of our study for the canonical Wat son-Crick base pairs obtained by the MP2 method in the expanded basis sets will be presented elsewhere.
Authors are grateful to Dr. N. Kurita (Toyohashi University of Technology, Tempaku-cho, Toyohashi, Japan) for kind presenting some computational results before publication and Dr. M. Bickelhaupt (der Vrije Universiteit De Boelelaan 1083 NL-1081 HV Ams terdam, The Netherlands) for the discussion on the nature of the hydrogen bond in DNA base pairs.