Phylogenetic study on structural elements of HIV-1 poly ( A ) region . 1 . PolyA and DSE hairpins

Genome of human immunodeficiency virus type 1 (HIV-1) is highly heterogeneous. Aim. A phylogenetic study on structural elements of the HIV-1 poly(A) region, in particular polyA and DSE hairpins which compose a core poly(A) site. Methods. The secondary structure of the HIV-1 core poly(A) site has been predicted by the UNAFold program. Results. The structure of the polyA and DSE hairpins has been analysed in 1679 HIV-1 genomes of group M and 18 genomes of simian immunodeficiency virus SIVcpzPtt. We found 244 and 171 different sequences for the HIV-1 polyA and DSE hairpins, respectively. However 70 % of the HIV-1 isolates studied contain one of 7 variants of the polyA hairpin which occur with a frequency 5 % (main variants) and 79 % of the isolates contain one of 7 main variants of the DSE hairpin. We also revealed subtype and country specific mutations in these hairpins. We found that the SIV polyA hairpin most closely resembles that found in HIV-1 genomes of B/C subtypes. Conclusions. The results of our large-scale phylogenetic study support some structural models of the HIV-1 5' UTR, in particular the tertiary interaction between the polyA hairpin and the matrix region in HIV-1 gRNA. Possibly, the DSE hairpin appeared in the course of viral evolution of the HIV-1 group M. An exposure of the U/GU-rich element in the apical loop of DSE hairpin could significantly increase the efficiency of pre-mRNA polyadenylation in this HIV-1 group.

Introduction.Polyadenylation of the pre-mRNAs in mammals and their viruses depends on at least two sequence elements: the poly(A) signal (most often the AAUAAA hexamer) and the U/GU-rich downstream sequence element (DSE).These elements compose the core poly(A) site.Multiple protein factors assemble onto this site, in particular, the cleavage and polyadenylation specificity factor (CPSF) binds to the AAUAAA hexamer and the cleavage stimulation factor (CstF) binds to the U/GU-rich DSE [1,2].The efficiency of polyadenylation process can be modulated by upstream sequence elements (USEs) or/and auxiliary downstream sequence elements (AuxDSEs) [3].In the HIV-1 retrovirus, the identical sequences encompassing AAUAAA and DSE are present at both the 5' and 3' ends of the HIV-1 pre-mRNA and a strict regulation is needed to re-press the premature polyadenylation at the 5' end of the transcript and stimulate the reaction at the 3' end.In particular, usage of the 3' poly(A) site is promoted by the USEs that are present exclusively at the 3' end of the transcript [4].
In HIV-1 pre-mRNA, the AAUAAA hexamer is partly occluded by base pairing in the upper part of the polyA hairpin [5], the stability of which is delicately balanced to allow the regulation of polyadenylation reaction at the both ends [6].Two putative DSEs, UGUGU and GUUGUGU, are located 6 and 19 nt downstream of the cleavage site, respectively.Recently we have first presented a structural model for the core poly(A) site at the 3' end of the HIV-1 pre-mRNA [7] (Fig 1).The tracts interacting with the polyadenylation factors as well as other functionally important elements are indicated in this figure.The other elements are functional at the 5' end of HIV-1 genomic RNA (gRNA), inasmuch as they are located in the long terminal repeats of proviral DNA they are also duplicated at the 3' end of the transcript.
HIV-1 genome is highly heterogeneous.The aim of this work was a large-scale phylogenetic study on the structural elements of the HIV-1 poly(A) region.First we investigated how mutations in the polyA and DSE hairpins affect their secondary structure and also compared the structure of these elements in HIV-1 pre-mRNAs and pre-mRNAs of simian immunodeficiency virus of chimpanzee Pan troglodytes troglodytes (SIVcpzPtt).
Materials and methods.The sequences encompassing the complete poly(A) region in HIV-1 and SIV genomes have been extracted from the Entrez Nucleotide database of NCBI.We have examined all HIV-1 genomic sequences presented in this database by the end of 2010 and all corresponding genomic sequences from SIVcpzPtt and SIVgor (gorilla) presented by the end of 2012.The secondary structure of the poly(A) site has been predicted by the UNAFold program [11].The base changes in sequences of HIV-1 and SIV pre-mRNAs were determined as compared to RefSec (the HXB2 isolate, GenBank accession number K03455).Nucleotide numbering starts with 1 at the first nucleotide of each individual structural element.
Results and discussion.PolyA hairpin.We have analysed the structure of elements of the HIV-1 core poly(A) site for 1679 HIV-1 pre-mRNAs of group M (isolated from 997 patients) and found 244 different sequences for the polyA hairpin.These sequences contained up to ele-ven base changes in comparison with the RefSec.The polyA hairpins with the combination of base changes occurring with a frequency ³ 5 % (the main variants, pA1-pA7) are shown in Fig. 2. Their distribution by subtypes A-C, and CRF01_AE, comprising large sub pools, is given in Table 1.The total data on other subtypes and CRFs comprising small subpools are listed in the last column.
All main variants of the polyA hairpin have identical upper part with a partly occluded AAUAAA hexamer, while their stems contain different small defects (bulges and internal loops).As seen in Table 1, some po-lyA hairpin variants, for example pA1 (without mutations) and pA2 (with the C39U base substitution) occur in HIV-1 isolates of several subtypes with different frequencies, while other variants occur mainly in HIV-1 isolates of certain subtypes.In HIV-1 isolates of subtypes D, F and G, the variants pA1 and pA2 are the most frequent.It is of interest to note that none of the base changes occurring in the main variants of polyA hairpin of subtype C isolates (pA4 and pA7) was found in the main variants of CRF01_AE (pA5 and pA6).
Depending on the subtype, 52-84 % of HIV-1 isolates contain one of main variants of the polyA hairpin (Table 1).However, within a certain subtype their occurrence may be country specific.For example, the po-lyA hairpin with the double mutation U38C + C39U (pA4) occurs often in HIV-1 isolates of subtype C from the African countries: Mozambique (35 %), South Africa (47 %), Tanzania (54 %), and Zambia (72 %) and rarely in Indian HIV-1 isolates (3 %).On the contrary, the variant with the four base changes (pA7) occurs frequently (55 %) in subtype C isolates from India and rarely or is absent in the African countries.Some polyA hairpins occurring with a frequency < 5 % in the HIV-1 isolates studied, though not presented in Table 1, are rather frequent in certain countries, for example, the hairpin with base changes C36A + U37C + U43C (pA8, Fig. 2) was found in 10 % of subtype A isolates from Tanzania.In general 70 % of all HIV-1 isolates studied have one of the 7 main variants of the polyA hairpin.What are the peculiarities of the polyA hairpin formed in the rest of HIV-1 isolates?Table 2 is given to illustrate this issue.It shows the base changes in the polyA hairpin for 26 HIV-1 isolates of subtype C from Mozambique, which comprise half of the whole pool of these isolates.As many as 54 % of HIV-1 isolates presented Fig. 1.The 3' core poly(A) site of HIV-1 HXB2 pre-mRNA (RefSec).Nucleotide positions are numbered as in the HXB2 genome.The AAU AAA hexamer and the U/GU-rich signals are shadowed.The motifs which are functional at the 5' end of HIV-1 genome: 1 -involved in the long distance interaction with the matrix coding region [8]; 2 -the 5' strand of U5-AUG duplex [9]; 3 -a primer activation signal (PAS) regulating the initiation of reverse transcription [10] in Table 2 have one of the main polyA hairpin variants such as pA4 (42 %), pA2 (8 %) and pA7 (4 %).The polyA hairpins in the rest of isolates can be considered as these three main variants with one or two base changes which occur with different frequencies in the HIV-1 isolates studied.
For example, A32G (GenBank acc.no.AM076852) occurs with a frequency below the error of sequencing and submission of HIV-1 genomic sequence into Gen Bank (0.5 % [12]), while A44G (AM076846) is very frequent in HIV-1 isolates of subtype CRF01_AE but is random in isolates of subtype C. Precisely rare and random base changes lead to a great variety in sequences of the polyA hairpin.The rare and random mutations presented in Table 2 did not affect greatly the overall structure of the polyA hairpin, except for G31A (AM076874).This base change occurs singly or in the combination with different mutations with a frequency of 2.1 % in the HIV-1 isolates studied.In most cases, it results in two alternative conformations of the hairpin with the  same free energy and complete exposure of the AAUA AA hexamer in one of these conformations (for example, see pA9 in Fig. 2).As seen in Table 2, the double mutation U38C + + C39U (both as the pA4 variant and in combination with other base changes) is prevailing among the HIV-1 isolates from Mozambique (77 %).The availability of a certain motif containing one or several frequent base changes in combination with rare and/or random mutations is also characteristic of the polyA hairpins in HIV-1 isolates of other subtypes.It is also a peculiarity of other structural elements in the HIV-1 polyA region.

Table 2 Base changes in polyA hairpin of HIV-1 subtype C isolates from Mozambique
In general the base changes which led to significant alterations in the polyA hairpin structure occurred only in 4 % of all HIV-1 genomes studied.The structures of the polyA region in the HIV-1 isolates studied are presented in our database CESSHIV-1 which is currently available online at http://www.cesshiv1.org.Base change frequency at each position of the polyA hairpin is presented in Suppl.information (Table S1).Rather frequent mu-tations in this element are an insertion between positions 10, 11 and base changes at positions 37-40, 43, and 44.
HIV-1 gRNA contains several strong binding sites for multifunctional virion infectivity factor (Vif) including the polyA hairpin [13].In particular, Vif-RNA interaction is important for viral particle assembly.Vif specifically binds to the stem of polyA hairpin, however structural and sequence determinants of this binding are not defined.Various defects in the stem of polyA hairpin (Fig. 2) can modulate its binding to Vif.Base changes in the AAUAAA hexamer and the neighbouring GCUUGCC tract occur with a frequency below the error level, except for a position 28 in the po-lyA hairpin at which mutations occur with a frequency of 0.7 %.We have also shown high conservation of the sequence GGCAAGC in the matrix region of HIV-1 gRNA for a large pool of HIV-1 isolates.Thus, our data support a statement about the long distance interaction between the GCUUGCC and GGCAAGC sequences important for the tertiary structure of the 5' untranslated region of HIV-1 gRNA [8].
As mentioned in Introduction, a balanced stability of the polyA hairpin is crucial for HIV-1 replication [6,14].Both destabilization and stabilization of the wild type hairpin inhibits polyadenylation of HIV-1 pre-mRNA at the 3' end of the transcript.Stabilization of this hairpin by 10.4 kcal/mol blocked the access of polyadenylation factors to the hexamer and led to complete loss of polyadenylation at the 3' end of HIV-1 pre-mRNA [6,14].Destabilization by 8.5 kcal/mol increased the efficiency of premature polyadenylation at the 5' end of HIV-1 transcript from 5-10 % to 30-40 % [14], that affects less severely polyadenylation at the 3' end than stabilization of the polyA hairpin.It was shown by the method of virus evolution that HIV-1 mutants with stabilized or destabilized polyA hairpin have improved their replication capacity via drifting towards a hairpin of thermodynamic stability that is close to the value of the wild type hairpin [15].Free energy distribution of the HIV-1 polyA hairpins is presented in Fig. 3, A.Here we have analysed the HIV-1 genomes which contain both the complete poly(A) region (see Materials and methods) and incomplete regions encompassing a polyA hairpin sequence (see CESSHIV-1 database).In sum we have analyzed 1863 genomes (from 1072 patients) which contain 286 different sequences of the polyA hairpin.For two patients with multiple identical sequences of the polyA hairpin, we considered for analysis only one intrapatient sequence.
The major peak in Fig. 3, A, corresponds to the HIV-1 isolates containing the polyA hairpins with free energy (dG) of -18.5 kcal/mol.The main variants of the po-lyA hairpin pA1, pA2 and pA4 make a major contribution to this peak (92 %), while other main variants contribute to minor peaks at -16.5, -18.1 and -20.6 kcal/mol.The minor peak at -12.8 kcal/mol corresponds mostly to the polyA hairpins with the base change G31A and double mutation G31A + C39U.
Free energy distribution of the 286 HIV-1 polyA hairpins with different sequences is shown in Fig. 3, B. In general both distributions are similar.The main peak in Fig. 3, B, also corresponds to -18.5 kcal/mol.A maximal number of different sequences of the polyA hairpin (29 ones) contribute to this peak.Both distributions are skewed, in particular the number of HIV-1 isolates with the polyA hairpins with dG > -18.5 kcal/mol (26.9 %) significantly exceeds that with dG < -18.5 kcal/mol (6.5 %).
Free energy distributions of the HIV-1 polyA hairpins for different subtypes ( A, B, C, D, G, CRF01_AE  and CRF02_AG) are similar to those presented in Fig. 3, A. All distributions are skewed and have the maximum at -18.5 kcal/mol, except that the plot for CRF01_ AE has a maximum at -18.1 kcal/mol.Thus, a free energy of the polyA hairpin lies within the range of -10.9 -20.9 kcal/mol in most HIV-1 isolates (98.5 %), and the polyA hairpin with dG of -18.5 kcal/mol is found in 66.6 % isolates.
To gain insight into the evolution of frequent mutations in HIV-1 polyA hairpin we studied a structure of the core poly(A) site in 18 isolates of SIVcpzPtt that is an ancestor of HIV-1 group M [16].Similarity of the po-lyA hairpin in SIVcpzPtt and HIV-1 was first reported by Berkhout et al. [5].The SIV polyA hairpin presented in their article possesses the upper part identical to that in HIV-1 HXB2 genome and a stem with two bulges 0 2 and 0 ´1.In our study we found 14 different sequences of the SIV polyA hairpin, including the sequences identical to the HIV-1 polyA hairpin variants pA1, pA2 and pA4.All SIV isolates have the polyA hairpin with the common upper part, except for three isolates in which the apical loop is elongated by 2 nucleotides (nt) thus exposing 5 nt instead of 4 nt of the AAUAAA hexamer.
The combination of base changes 35_36insA + + C39del + A40G + U43C can be considered as a frequent motif of the SIV polyA hairpin (Supplementary information, Table S2).Three SIV isolates have the po-lyA hairpin containing exclusively this motif (pA10, Fig. 2).The double mutation 35_36insA + C39del was found in three HIV-1 polyA hairpins (two isolates of subtype C and one of subtype B) and A40G + U43C is a constituent of the combination of base changes in the HIV-1 polyA variant pA7 specific for subtype C isolates.
DSE hairpin.The variants of HIV-1 DSE hairpin with combinations of base changes occurring with a frequency ³ 5 % (the main variants, DSE1-DSE7) are shown in Fig. 4. Their distribution by subtype is given in Table 3.The variants DSE1-DSE3 are found in HIV-1 isolates of different subtypes, while the variants DSE4-DSE7 occur predominantly in HIV-1 isolates of certain subtype.HIV-1 isolates of subtypes D, F and G commonly have the DSE hairpin without mutations (DSE1), except that the variant with the base changes G6A + + U16del + G17del (DSE8, Fig. 4) occurs rather frequently in HIV-1 isolates of subtype D (19 %).About 80 % of the HIV-1 isolates studied have one of the main DSE hairpin variants.
The apical loop is the most variable region of the DSE hairpin (Supplementary information, Table S3).In the U/GU-rich tract, frequent base changes occur at positions 12, 13, 15 and 16.The combination of these four base changes, which is specific for CRF01_AE, does not impair DSE signal (DSE6, Fig. 4).According to our description of the mammalian DSE region [17], it is composed of certain U/GU-rich pentamers (in particular the GUUGU, UGUGU or GUGUU tracts) which are located at different distances from each other.All the-se pentamers are found in the apical loop of the main variants of the DSE hairpin as single or overlapping tracts.In DSE5, the GUUGU pentamer overlaps the UGUUU tract which is a U-rich (URE) DSE signal of the type «a four out of five base URE» [18].In some HIV-1 isolates of CRF01_AE, the mutations at positions 12 and/or 13 impair DSE signal by preventing a complete exposure of the U/GU-rich tract, which can decrease an efficiency of the polyadenylation process.It concerns mainly the group of 46 HIV-1 isolates from 3 patients, which possess the DSE hairpin with a combination of base changes G6A + C8U + U12del + G15C + U16A + 17G18.
The sequences of both strands of the DSE hairpin stem are well conserved.The rather frequent base change G6A destabilizes the hairpin and leads to partial occlusion of the U/GU-rich tract in the optimal structure; however this tract is completely exposed in a the suboptimal structure with close free energy (DSE2, Fig. 4).Rare mutations (4 %) occur at position 17 (deletion and G17A base change) and between positions 17 and 18 (insertion).The deletion occurs in 39 % of HIV-1 isolates of subtype D and does not affect DSE signal exposure (DSE8, Fig. 4).The insertion 17G18 in combinations with some base changes impairs DSE signal exposure, mainly in the above mentioned group of 46 isolates of CRF01_AE.
The 5' strand of the DSE hairpin together with two neighboring residues (tract 2 in Fig. 1) correspond to the 5' strand of the U5-AUG duplex formed at the 5' end of HIV-1 gRNA.The mutation G6A in the DSE hairpin corresponds to G9651A in HXB2 genome resulting in a G-U to A-U base pair substitution in the duplex.We have shown that the 3' strand of this duplex is also well conserved in a large pool of HIV-1 isolates, which supports the formation of the U5-AUG duplex.The conservation of tract 3 encompassing PAS signal supports its functional importance, including tRNA 3 Lys -PAS interaction.In the SIV isolates studied, we have not found a hairpin similar to the HIV-1 DSE hairpin as distinct from the polyA hairpin occurred in both HIV-1 and SIV.However a hairpin with the bottom duplex like in the HIV-1 DSE hairpin (frequently with an additional base pair G-U) was found in almost all SIV isolates (DSE SIV 1-DSE SIV 3, Fig. 4).The main difference between the HIV-1 DSE hairpin and its SIV analogue is the absence of the U/GU-rich tract in the DSE SIV apical loop.Only one SIV hairpin has such tract but not completely exposed (DSE SIV 2, Fig. 4).Probably, SIV isolates use UGUGU signal just downstream of the polyA hairpin (Fig. 1) that is partly occluded.
The strains of HIV-1 are classified into four groups: M, N, O and P which are of chimpanzee or gorilla origin [16].We have not found any DSE hairpin exposing the U/GU-rich tract in either HIV-1 groups N, O and P (36 isolates) or SIV from gorilla (5 isolates).We hypothesize that the U/GU-rich tract exposed in the apical loop of DSE hairpin has been acquired in HIV-1 group M in the course of evolution.At present it is poorly understood why only group M, but not other HIV-1 groups, resulted in a global pandemic [16].An effective DSE may be one of the features that makes group M much more prevalent than groups N, O and P.
457 PHYLOGENETIC STUDY ON STRUCTURAL ELEMENTS OF HIV-1 POLY(A

Table 1 Occurrence
Optimal structures of polyA hairpin variants in HIV-1 and SIVcpzPtt pre-mRNAs.The base changes as compared to RefSec are squared, insertions and deletions are indicated by triangle and spot, respectively.The AAUAAA hexamer is shadowed.pA1-pA7 -main variants of the polyA hairpin in HIV-1 isolates of group M; pA8 -rather frequent variant in subtype A isolates from Tanzania; pA9 -variant with the G31A base change; pA10 -polyA hairpin variant in SIVcpzPtt isolates of polyA hairpin variants in HIV-1 isolates of different subtypes (%)