Auxiliary elements of mammalian pre-mRNAs polyadenylation signals

Polyadenylation is one of the levels at which eukaryotic gene expression is regulated. The auxiliary elements of mammalian polyadenylation (poly(A)) signals are known to influence positively or negatively the efficiency of З'-end processing. In this paper, we surveyed the literature on auxiliary elements of mammalian poly(A) signals. We also compiled the database of human poly(A) signals and searched it for auxiliary elements. This database contains 244 pre-mRNA sequences covering the cleavage region. Literature data and our database screen demonstrated that auxiliary elements, particularly, sequences binding Ul snRNP-specific Ul A protein, are widely presented in cellular pre-mRNAs. Besides, we analyzed our database for the availability of the polynucleotide sequences with a potential to form G-quadruplexes or i-motifs which we consider as possible auxiliary elements. Based on published experimental findings on formation of quadruplexes, the schematic structures of possible quadruplexes are proposed for several human pre-mRNAs. The structures of putative quadruplexes are also represented for G-rich auxiliary downstream element of SV40 L poly(A) signal

Introduction.Poly (A) tails of mammalian mRNAs are important for the regulation of mRNA stability, mRNA export and translation initiation [1][2][3].Mo reover, the polyadenylation process is essential for the regulation of gene expression [4].For example, numerous pre-mRNAs contain several poly (A) signals which are located in the 3' untranslated regions (UTRs) of different exons or in the same region of a single exon.In the first case, the use of the alter native poly (A) sites results in synthesis of different proteins.In the second case, pre-mRNAs with dif ferent 3' UTRs are translated into the same protein which could be synthesized with different efficiency, when the elements regulating mRNA stability or translatability are present between tandemly arranged poly (A) signals.It is known that the polyadenylation process is tightly coupled with transcription termina tion and splicing, poly (A) signals being important for efficiency of both processes [1][2][3].Thereby, the study of poly (A) signal organization which is not completely clear at present, has a great importance for better understanding of cell functioning.
The polyadenylation reaction of mammalian pre-mRNAs proceeds in two stages: endonucleolytic clea vage of pre-mRNA and subsequent addition of po ly (A) sequence to the newly formed 3 , -end [1][2][3].Poly (A) signal directing this reaction consists of two core elements which are located upstream and down stream of the cleavage site.Additionally, poly (A) signal can contain auxiliary elements.In our reviews [3,5], we concluded that the core downstream element of poly (A) signal is not a degenerate U/GUrich sequence, as it was assumed earlier [1,2], but rather consists of different number of simple elements (5nt long as a minimum) located at different distances from each other.The shortest elements are four out of five base U-rich elements (UREs) and tracts consisting of two GU dimers and one U residue.To further examine the structure of mammalian poly (A) signals, we developed the database of human poly (A) signals.This database includes 244 DNA sequences in the cleavage site region which were randomly selected from GenBank at National Center for Bio technology Information (NCBI).Our findings of pri mary structure of core elements of poly (A) signals were reported in our review on the downstream elements [5].
In this paper, we discuss the upstream and downstream auxiliary elements of mammalian poly (A) signals with a special emphasis on possible role of RNA four-stranded structures in the polyadenylation process.
Database of human poly (A) signals.Our da tabase of human poly (A) signals contains 244 DNA sequences corresponding to the pre-mRNA regions within 200 nt upstream of the cleavage site (or up to the 5'-end of the exon) and 200 nt downstream.To determine the downstream region, we compared the sequences of genomic DNAs and appropriate mRNAs also extracted from NCBI databases.We selected DNA sequences marked by Evidence Code С (con firmed gene model -model based on alignment of mRNA, or mRNAs plus ESTs, to the genomic se quence).Only mRNAs of status Reviewed with speci fied cleavage/polyadenylation site were examined.A half-part of our database is shown in Fig. 1, only 140 nt long sequences being listed.Core elements and some auxiliary elements of poly (A) signals known or supposed to participate in the polyadenylation process are underlined.The full database is available from the authors upon request.
The polyadenylation reaction is performed by a large set of proteins [1][2][3].Among them there are the cleavage and polyadenylation specificity factor (CPSF) which binds to the upstream core element of poly (A) signal, and the cleavage stimulation factor (CstF) which interacts with the downstream core element.In addition, both proteins interact with each other.Previously, it was assumed that the upstream element of poly (A) signal is represented by the AAUAAA hexanucleotide in most animal pre-mRNAs [1,2].More recent studies have shown that the actual occurrence of AAUAAA is much lower, perhaps as low as 50-60 % [6][7][8].We found that 69 % of pre-mRNAs in our database possess this element [5 ].In this connection, we grouped the studied pre-mRNAs by the core upstream element type ( [5] and Fig. 1

herein).
The pre-mRNAs having the canonical AAUAAA hexamer belong to the largest group I.The pre-mRNAs containing AAUAAA with single-base substi tutions are divided into II-IV groups.Single-base substitutions in canonical hexamer were shown to significantly reduce the efficiency of pre-mRNA clea vage and polyadenylation with the only exception of the AUUAAA hexamer which directs the cleavage with the relatively high efficiency (66 % of the wild type) [9 ].Group II is comprised of pre-mRNAs with this hexanucleotide.
Statistical analysis of mammalian DNA databases has revealed that the number of 3' EST containing AAUAAA with single-base substitutions correlates with their processing efficiencies in vitro [7].The most effectively processed variants were found to be the natural functional elements [6].These variants are described by the NNUANA consensus, where N is any nucleotide.
We allocated the pre-mRNAs with NNUANA elements in group III.The pre-mRNAs containing AAUAAA with other single-base substitutions make up a small group IV.These elements may be both functional and deleterious depending on the pre-mRNA source (refs. in [6]).The pre-mRNAs con taining the canonical upstream element with sub stitutions of any two bases form group V. Little else is known about such elements in the literature.
The remaining pre-mRNAs we placed in group VI.Though these transcripts do not contain hexamers of I-V types in the vicinity of the cleavage site, most of them contain elements I-III (evident elements of poly (A) signal) elsewhere within the upstream region under examination.Some of these hexamers may be signals of cleavage at the distant site if they and the core downstream elements are brought together due to formation of the stem-loop structure between them [10].In Fig. 1 we underlined the hexamers of I-III groups located not only in the vicinity of the cleavage site but everywhere, since some of such hexamers may be parts of alternative poly (A) signals.Actually, about 60 % of pre-mRNAs from our database contain more than one hexamer of I-III groups, and about 70 % of these pre-mRNAs have putative downstream elements located not far from the hexamers.
The schematic picture of a mammalian poly (A) signal is shown in Fig. 2. Using our database, we determined the distances between the cleavage site and the core upstream/downstream elements.The figure displays the most frequent distances, which are consistent with the data reported [9,11].The most frequent distance between two adjacent core elements of mammalian poly (A) signals in our database appea red to be in the range of 25-50 nt.
An analysis of the data published [3] and our database screening showed that poly (A) signals ha ving only core elements may direct the polyadeny lation reaction with varied efficiency.The efficiency may depend on the upstream element type, the number of downstream elements, primary structure of the region around core elements, secondary structure of core poly (A) signal, and other parameters.Also, many pre-mRNAs contain auxiliary elements which influence (positively or negatively) the core poly (A) signal efficiency [2,3].In the next section, we LGALS discuss the published data on these elements and analyze their occurrence in our database.Auxiliary elements of mammalian poly (A) sig nals.We summarized the available information on the auxiliary elements in Tables 1 and 2, showing the sequences of the elements and the proteins binding specifically to them.Spliceosomal proteins Ul A and Ul 70K (Table 1, lines 1-3) belong to the group of proteins with poly (A) polymerase (PAP) regulatory domains (PRDs) [12][13][14].PAP fulfils the second stage of the polyadenylation reaction, addition of poly (A) tail, and in the most cases also participates in the first stage, cleavage of a transcript [1][2][3].Both Ul A and Ul 70K (as a part of Ul snRNP) specifically bind to the substrate RNA and inhibit poly (A) addition via their PRDs.This inhibitory mechanism is used in Ul A autoregulation [12,15], in control of expression of immunoglobulin M secre tory mRNA [16], and in repression of bovine papil loma virus (BPV) late gene expression [13].In the case of Ul A, two molecules of protein are required to inhibit PAP [15].U1 A directly binds to pre-mRNAs via the N terminal RNA binding domain (RBD), its binding to Ul A pre-mRNA being stronger than to IgM transcript.Phillips et al. [16] found Ul A-binding motifs in pre-mRNAs of other immu noglobulin isotypes and suggested that the expression of these mRNAs might be regulated similarly to the expression of IgM mRNA.We discovered that about 9 % of pre-mRNAs available in our database contain two or more putative binding sites for the N terminal RBD of Ul A. In Fig. 1 these sites presented by the AUUGC/UAC [15] og AU/GGCN 23 C [16] sequen ces are underlined.
Ul 70K inhibits PAP when bound to Ul snRNA.In this case, the auxiliary upstream element (AUX USE) which participates in regulation of the poly adenylation reaction represents the pseudo 5' splicing site (5' ss) to which Ul snRNP binds (Table 1, lane 3).Screening our database, we have not found pre-mRNAs with consensus 5' ss (С/AAGGUA/GA-GU) located in the upstream region.However, we have not searched for 5' ss homologs, which can also act as inhibitory elements ( [17]

and refs. therein).
Ul A can influence the polyadenylation reaction not only negatively.Its interaction with PAP results in inhibition of poly (A) addition.In contrast, Ul A interaction with the other basal factor of polyade nylation machinery (CPSF) leads to З'-end pro cessing stimulation by stabilization of CPSF binding to substrate RNA [18].While Ul A interacts both with its own pre-mRNA and IgM pre-mRNA via the N-terminal RBD, it binds to the SV40 L pre-mRNA (Table 1, line 4) via the C-terminal RBD [19].Preliminary results obtained by Lutz et al. [18] show that Ul A enhances CPSF-dependent polyadenylation of the precleaved SV40 L pre-mRNA in vitro not binding to RNA substrate.Nevertheless, the authors believe that this binding is important for in vivo reaction.In our opinion, the interaction of Ul A with RNA can at least increase the local concentration both of Ul A and CPSF (bound to Ul A) in the vicinity of the core poly (A) signal, thereby facilitating З'-end processing reaction.
The motif, similar to Ul A-binding elements in SV40 L pre-mRNA, was also found in the pre-mRNA of ground squirrel hepatitis В virus (GSHV) in the region required for increasing the efficiency of the polyadenylation reaction directed by noncanonical UAUAAA hexamer [20].It should be noted that multiple upstream auxiliary elements are present in GSHV pre-mRNA, but the mechanisms of their functioning remain unclear [20,21 ].When the pre sent article has been ready for publication, Natalizio et al. reported that elements resembling AUX USEs in SV40 L pre-mRNA can also be found in many cellular pre-mRNAs [22].The authors showed that these elements stimulate З'-end processing of three human collagen pre-mRNAs.One of them is shown in Table 1 (line 5).In our database search, we found that 8 pre-mRNAs have putative binding sites for C-terminal RBD of Ul A described by UAU 2 _ 5 GUNA

Continuation of tabl I Table 2 Downstream elements which influence the efficiency of core poly(A) signals
consensus [22], these sites are located within 70 nt upstream of the cleavage site.In fact, ten Ul A binding sites of collagen and SV40 L pre-mRNAs examined [19,22] may be described by U/AU 2 _ 4 G-U/ANA/U consensus.As many as 18 pre-mRNAs available in our database were found to contain these elements located close to the cleavage site, while other 24 transcripts contain such elements in the remote upstream region.Taking into account the frequency of binding sites both for N and С terminal RBDs of Ul A in the poly (A) signal region of pre-mRNAs (the literature data analysis and our database screening), it is possible to suggest that regulation of polyadeny lation reaction via Ul A protein is rather frequent event.
Table 1 (lines 6-9) shows those pre-mRNAs which use the same mechanism to enhance poly adenylation [23][24][25][26][27][28].The stimulation observed re sults from stabilizing the CPSF binding to substrate pre-mRNA without participation of auxiliary proteins like Ul A. It occurs via direct binding of CPSF to U-rich auxiliary upstream elements [23,24].Evi dently, CPSF has two RNA binding sites, but it is still unknown if CPSF binds to AAUAAA and U-rich sequence simultaneously, or the U-rich element acts only as an entry site for CPSF which subsequently slides along RNA towards core poly (A) signal [25].Interestingly, CPSF is UV cross-linked to the U-rich element only in the context of AAUAAA [23,24].Consensus for U-rich element to which CPSF binds is presently undetermined.Possibly, oligo(U) tracts are the special feature of these elements.Their oc currence is rather high in pre-mRNAs of our da tabase: approximately 13 % of the transcripts studied contain several oligo(U) tracts in the vicinity of the upstream core element of poly (A) signal.
The auxiliary upstream element found in C2 complement pre-mRNA includes the fragment which serves as a binding site for heterogeneous nuclear protein hnRNP I (Table 1, line 10), also called the polypyrimidine-tract (PPT) binding protein (PTB) [29].PTB binding to C2 pre-mRNA AUX USE enhances the cleavage reaction, the mechanism remai ning unknown.This auxiliary element is also required for activating poly (A) addition via the general poly adenylation factor CstF [29 ].It is the first and yet a single demonstration that CstF enhances the second stage of polyadenylatiqri reaction and exerts its effect by binding to AUX USE.In our database, 9 pre-mRNAs were found to contain PPTs with consensuses YYYYUCUUY (refs. in [30]) located within 70 nt upstream of the cleavage site.
Huang et al. showed that the transport elements from three naturally intronless pre-mRNAs (histone H2a, herpes simplex virus thymidine kinase (HSV Tk), and hepatitis В virus (HBV)) in addition to promoting mRNA nuclear export act as polyade nylation enhancers and as splicing inhibitors [31 ].The activatory elements of these three pre-mRNAs are different, the mechanisms by which they sti mulate the polyadenylation reaction are still un known.Though some proteins binding to the trans port elements were found, their role in polyadeny lation has not yet been clarified.In Table 1, we therefore demonstrate only one of these elements (from histone H2a pre-mRNA, line 11).The 22 nt long fragment of this element was shown to be required for all three activities [32 ].Two members of the serine/arginine-rich (SR) protein family (SRp20 and 9G8) specifically bind to this fragment.It is also worth noting that PTB specifically interacts with a fragment of the transport element from HEV [30].PTB [29 ] and SRp20 (see text below) are known to participate in the polyadenylation process.Screening our database, we found that 5 pre-mRNAs have the binding site for SRp20 (C/UA/UCUUCAU [32]) in the upstream region.
Several pre-mRNAs are known to have the au xiliary upstream elements, though little information exists about them.Among them are adenovirus LI, Epstein-Barr virus DNA polymerase (EBV pol) and 2'-5' oligo A synthetase enzyme (OASE) transcripts.In LI pre-mRNA, the unidentified auxiliary elements, located between 50-113 nt upstream and 52-170 nt downstream of the cleavage site, enhance the stability of CPSF binding to substrate pre-mRNA [33 ].CPSF is likely to directly interact with the Ad LI AUX USE.In EBV pol pre-mRNA auxiliary element (UUUGUA) is located 8 nt upstream of the noncanonical UA-UAAA hexamer, besides the elements located farther on may also function as AUX USEs [34 ].The authors suppose that EBV SM early protein may participate in enhancement of З'-end processing reaction [35].The auxiliary element identified in OASE pre-mRNA [36 ] is presented in Table 1 (line 12), the mechanism of its functioning being unknown.
Several papillomaviruses, besides BPV (Table 1, line 3), contain inhibitory elements controlling pro duction of the late RNAs (137] and refs.therein).The negative regulatory element (NRE) of human papillomavirus type 16 (HPV-16) appears to regulate polyadenylation (unpublished data of McGuire and Graham reported in [37 ]), nuclear export and mRNA stability (refs. in [37]).NRE was shown to spe cifically interact with three proteins: U2AF 65, CstF and Elav-like HUR [38].The fragment of NRE required for U2AF 65 binding [17] is given in Table 1 (line 13).It contains GU/U-rich tracts which are also potential CstF-binding sites.However, the whole NRE (79 nt long sequence) is required for CstF binding.All these proteins (U2AF 65, CstF and HUR) bind to complex negative regulatory element of HPV-31 [37].Cumming et al. [37] think that CstFbinding sites of HPV-31 element could compete with the downstream core element of poly (A) signal for CstF binding, reducing thereby the rate of poly adenylation reaction.They also suppose that U2 AF 65, which has PAP regulatory domain homologous to those in Ul A and Ul 70K [14,39], inhibits poly (A) addition.HUR may act by stabilizing late mRNAs in the cytoplasm.AUX USEs listed in lines 14-16 of Table 1 exert an unusual effect on polyadenylation.They are responsible for occurrence of pre-mRNAs with short discrete < 20 nt long poly (A) tails [40,41].Spacing between these USEs called poly (A)limiting elements (PLE) and poly (A) signal plays no detectable functional role and may be more than 1000 nt long (Table 1, line 16).But the presence of these elements in the terminal exon is essential for their functioning [42 ].PLEs are likely to inhibit the later phase of poly (A) synthesis -addition of poly (A) tail to short oligo(A) tract which is probably formed by the cleavage complex [43].In database search Das Gupta et al. [42] identified several hundred genes with PLE-like elements and showed that the mRNAs of those genes that were examined by them have < 20 nt poly (A) tails.They also identified the -62 kDa protein which specifically binds to PLEs.Screening our database, we failed to find pre-mRNAs with PLE elements (AGUUCCUUYRGCURNRNRRR [42]).
The splicing-associated factor SRml60 (SR-related nuclear matrix protein) is given in Table 1 (bottom line) in view of the coupling of polyadeny lation and splicing.Vagner et al. consider poly (A) signals to act as 3'-terminal exon splicing enhancers in addition to their role in the З'-end processing [39].They suppose that PAP, as a part of the cleavage complex, can interact with U2AF 65 and help to tether this factor to the pyrimidine tract of the adjacent 3 , -splice site, thereby stimulating splicing.Correspondingly, the splicing signals can be con sidered to act as polyadenylation enhancers.The mechanism by which splicing can stimulate polyade nylation is clarified in recent work of McCracken et al. [44], who found that SRml60 specifically binds to general polyadenylation factor CPSF and promotes the cleavage of pre-mRNAs.SRml60 forms a part of a splicing-dependent complex deposited by spliceosome 20-24 nt upstream of exon-exon junctions [45 ].The enhancement of polyadenylation via SRml60 is not pre-mRNA specific and probably takes place in case of different splicing substrate.
The last upstream auxiliary element under dis cussion here is a part of poly (A) signal of the adenovirus-2 (Ad 2) L4 pre-mRNA.Sittler et al. [46 ] believe that more effective part of this positevely acting element (USEa) plays a structural role.It forms the ascending arm of the hairpin structure with AAUAAA in the 14-nt loop, where the hexamer is well exposed to cleavage and polyadenylation factors.However the authors do not formally exclude the fact that USEa (UCUCUGUGCUGA) is also a recognition site for a regulatory protein, possibly as a part of the helix.
Downstream auxiliary elements known so far are not so numerous as upstream ones.Some of them are shown in Table 2. AUX DSE of the SV40 L pre-mRNA is represented by G-rich region (Table 2, line 1) which specifically interacts with hnRNP H/H' [47,48 ].The mechanism of З'-end processing stimulation is undetermined to date.It was recently shown that many pre-mRNAs (about 34 % of the surveyed transcripts) contain short G-tracts in the downstream region of poly (A) signals.All tested elements were found to bind H/H' protein and stimulate the po lyadenylation process [49].In this work, the con sensus for H/H' binding site was not identified, but the other studies demonstrated that this protein recognizes the GGGA/GGGGGC [50], and probably GGGU containing tracts [51 ].Interestingly, the GG-GA sequence is recognized not only by H/H', but by all proteins of hnRNP H family [50] which also includes F and 2H9 proteins.The relevance of 2H9 protein to the polyadenylation process is currently unclear, while hnRNP F was shown to diminish З'-end processing [52].Most pre-mRNAs studied using our database contain GGGA/U or GGGGGC tracts.About 40 % of transcripts contain such ele ments located immediately downstream and about 20 % -immediately upstream of the cleavage site.
The 5' splicing site can influence the polyadeny lation process negatively when being positioned not only upstream of the cleavage site (BPV-1, Table 1, line 3), but downstream as well.This is the case of human immunodeficiency virus type 1 (HIV-1) (Ta ble 2, line 2).However, in the case of HIV-1, in contrast to BPV-1, the first stage of the polyadeny lation process but not the second one is inhibited [53].Ashe et al. [54] showed that the binding site for Ul 70K in stem-loop 1 of the Ul snRNP is associated with the cleavage inhibition.Regulation of poly (A) signal use in 5' LTR of HIV-1 pre-mRNA has two peculiarities.First, replacement of relatively weak wild-type poly (A) signal with the strong one abolishes 5' ss inhibitory effect [53].Second, the sequence between the cap structure and AAUAAA is required for the regulation [55].The mechanism of HIV-1 AUX DSE functioning is currently unknown.Searching our database, we found two transcripts with 5' ss in the remote downstream region.
In the bottom line of Table 2, the element which positively influences the calcitonin (CT) pre-mRNA polyadenylation [56,57] is presented.Interestingly, the core sequence of this enhancer composed of pyrimidine tract and 5' splicing site could be modeled on a pseudomicroexon with 0 nt within the exon [57 ].Splicing factors Ul snRNA, SRp20, PTB, and ASF/SF2 specifically bind to the enhancer core.The mechanism of CT pre-mRNA З'-end processing sti mulation by AUX DSE is presently unclear.Ac cording to the first model, PTB prevents U2AF binding to the enhancer pyrimidine tract that is inhibitory for the enhancer activity [57].By the second model, PTB or SRp20 can directly interact with polyadenylation factors [56,57].The third model is based on the fact that PTB can bind to a pyrimidine tract located between AAUAAA and the cleavage site in CT pre-mRNA.Dimerization of PTB bound to both the poly (A) signal and the enhancer core brings the other enhancer core-binding factors into proximity with polyadenylation factors.Screening our database, we found 9 pre-mRNAs having SRp20binding sites in the downstream region, and 17 transcripts contain PTB-binding sites in the down stream region beyond the core poly (A) signal.
The downstream auxiliary elements are also found in such pre-mRNAs as IgM [58 ], adenovirus-5 L3 [59], adeno-associated virus [59], and HPV-31 [37].However, the data on these AUX DSEs are poorly informative.AUX DSE is possibly present in human glycinamide ribonucleotide formyltransferase pre-mRNA (GART) [60].Kan and Moran showed that it contains 24 nt long poly(U) tract in intronic region located immediately downstream of GU tract supposed to be the downstream element of poly (A) signal [60].Poly(U) tract of the same length is also present close downstream of the core mouse poly (A) signal.The authors think that the conservation of poly(U) region is not fortuitous and this tract may be functional.Possibly it plays structural role.Some pre-mRNAs in our database also contain poly(U) tracts (-20 nt long) in the downstream region.
Thus, the upstream and downstream auxiliary elements are widely dispersed in viral and cellular pre-mRNAs and exert their effect on the polyadeny lation process in different ways.First, AUX USEs and AUX DSEs can serve as binding sites for auxiliary proteins which influence the polyadenylation reaction positively, as for example, hnRNP H/H' [49], or negatively, as Ul A (N-terminal RBD) [15].
Secondly, they can be binding sites for general polyadenylation factors.Two kinds of these sites are known at present.Some of them are the same as the core elements of poly (A) signal, while the others represent the distinct sequences to which a factor binds through the second RNA binding domain.In the first case, the auxiliary element may be a trap for general polyadenylation factor (for example, for CstF [37]).In the second case, the general factor (for example, CPSF [25]).may interact simultaneously with both binding sites, that stabilizes its interaction with the poly (A) signal.Besides, the distinct sequence may be an entry site for the general factor [25 ].
Third, AUX USEs and DSEs could play a structural role, for example, providing favorable fol ding of pre-mRNA region covering poly (A) signal where the core elements are well-exposed to the general polyadenylation factors [46].Besides, the contacts between general polyadenylation factors and auxiliary proteins may be provided through dimeri zation of PTB molecules bound to pyrimidine tracts located in the vicinity of these protein binding sites [57].
Chen and Wilusz [59] suggested other possible function of auxiliary elements.They showed that pseudoknot from viral RNA could functionally substi tute the unidentified Ad L3 AUX DSE.The authors supposed that stable secondary structures formed by auxiliary elements (like pseudoknot) may prevent sliding of CstF (relatively weak bound to RNA) along the pre-mRNA, and thereby limiting the region of interaction between the factor and the transcript mainly to the core poly (A) signal.
We suppose that some functions of auxiliary elements of poly (A) signals can be performed by four-stranded RNA structures, such as G-quadruplexes or i-motifs.We discuss below whether such structures may be formed in the region surrounding mammalian poly (A) signals and how they may influ ence the polyadenylation process.
Four-stranded RNA structures as possible auxi liary elements of poly (A) signals.Biologists begin to take an interest in four-stranded structures of poly nucleotides during last 15 years when it has been discovered that G-quadruplexes can be involved in various important cellular events such as meiotic synapsis and recombination [61,62], immunoglobulin class switch recombination [63 ], transcription regula tion [64 ] and some others [65 ].Most researches were focused on G-quadruplexes formed by DNA sequen ces, while a very little information was reported on RNA four-stranded structures.In particular, it was shown that the interstranded G-quadruplexes mediate the dimerization of HIV-1 RNA in vitro [66,67].Oliver et al. supposed that the gene 5 protein of filamentous bacteriophage fd may repress gene 2 mRNA translation through stabilizing the G-quadruplex formed by the operator sites of four mRNAs molecules [68 ].As for cellular RNAs, Christansen et al. [69] showed that G-quadruplex is formed in the 3' untranslated region (UTR) of insulin-like growth factor II mRNA (IGF-II).This four-stranded structu re was supposed to play a structural role ensuring that the site of endonucleolytic cleavage of IGF-II mRNA is accessible to interacting macromolecules and not sequestered in stable structures.Possible role of G-quadruplexes in RNA turnover was proposed by Bashkirov et al. [70] who purified mouse protein mXRNlp (homolog of the Saccharomyces cerevisiae Xrnlp exoribonuclease) which exhibits preference for RNA G-quadruplex-containing substrate.These fin dings substantially confirm the suggestion that RNA four-stranded structures occur in vivo.
We were the first to suppose that G-quadruplexes can play role in the polyadenylation process.We found that many mammalian and eukaryotic viral pre-mRNAs contain clusters (n > 3) of G-repeats in the region immediately downstream of core poly (A) signal and proposed that some of these G-rich sequ ences may form four-stranded structures [3].Scree ning of our database of human poly (A) signals confirmed frequent occurrence of GRSs (with a poten tial to form G-quadruplexes) in the region down stream of core poly (A) signal [5].Besides, we revealed that quadruplex-forming sequences also oc cur in the upstream region.
G-quadruplexes are composed of stacked Gtetrads which consist of four hydrogen-bonded copla-nar guanines [71][72][73][74].These structures can be formed by association of one, two or four polynuc leotide molecules.Here, we examine mainly unimolecular quadruplexes.The quadruplexes with three or more G-tetrads are considerably more stable than those with two tetrads.For example, G-quadruplex formed by the thrombin-binding aptamer GGTTGG-TGTGGTTGG has T m = 46.4°С in the presence of 25 mM KC1 [75], its folding pattern is shown in Fig. 3 [75].Elongation of this loop to TTTT also decreases the aptamer stability, but less effectively (Г т -34.0 °С), while the substitution of the central loop by the highly stable tetraloop GTAA leads only to slight change in melting temperature (T m --45.4 °С).Marathias and Bolton [76] showed that the analog of thrombin aptamer GGTGGTGTGGT-GG is not folded into a unimolecular quadruplex, indicating that in this case one residue is not enough to make the external loops.
The central loop of unimolecular quadruplex was shown to connect guanine tracts either diagonally or edgewise {71 -74].It is of the edgewise type in the thrombin aptamer (Fig. 3, A), while increase in number of G-tetrads or elongation of the loops leads to formation of quadruplex with the diagonal central loop [76 ].The other two types of G-quadruplex loops were identified ( [78 ] and refs.therein) but we do not view them in this article.
Thus, certain trends of formation of quadruplexes with G-tetrads could be summarized from expe rimental data reported.Though these trends were determined mainly for quadruplexes formed by Grich DNA sequences, 6ne may suppose that they are generally true for quadruplexes formed by G-rich RNA sequences too.Quadruplexes formed by one, two or four RNA molecules were reported in literature [66][67][68][69].Moreover, in some cases G-quadruplex formed by RNA sequence is more stable than the equivalent DNA structure [89 ].Based on the abovementioned information, we have searched our data base for the pre-mRNAs sequences with a potential to form quadruplexes.Approximately 27 % of pre-mRNAs studied appeared to contain such sequences.As many as 16 pre-mRNAs contain the sequences which may form quadruplexes with 3 or 4 tetrads.As an example, the model of secretin pre-mRNA frag ment folding into G-quadruplex is presented in  З, E. Four sequences found in our database have the potential to involve the A-A-A-A or U-U-U-U tetrads in the core quadruplex struc ture, but still there is no information on formation of such tetrads in unimolecular quadruplexes.These tetrads were shown to be involved in parallel stranded quadruplexes formed by four polynucleotide molecules [83,86].
Searching our database, 9 pre-mRNAs appeared to contain the sequences which could be folded into quadruplexes with AG-AG tetrads.However, by means of ab initio quantum chemical study, Gu and Leszczynski found that compared to the two separated AG pairs, the A-GAG tetrad has only 3.5 kcal/mol of stabilizing energy and suggested this tetrad may not be important in the quadruplexes structures [90].Results of studies on quadruplex formation by telomeric repeat sequences, (T 2 AG 3 ) 4 [91] and (T 4 AG 3 ) 4 [92], are in agreement with this suggestion.The authors proposed two possible uni molecular quadruplex structures for these sequences: the quadruplex with two G-tetrads flanked by two A-GAG tetrads and the.quadruplex with three G-tetrads (possibly flanked by one A-T-A-T tetrad [91]).Only the second model was supported by experimental observations, including chemical probing data [91,92].However, these results do not except the possibility of A-GA-G tetrad formation in quad ruplexes, because the intramolecular quadruplex with three highly stable G-tetrads in the case of (T^G^ and (T 4 AG 3 ) 4 is very likely to be more energetically favorable than those with two G-tetrads capped by two low stable A-GAG tetrads.The stabilization energy relative to the isolated bases is 66.5, 29.4 and 32.7/30.4kcal/mol for GG-G-G, AG-AG and A-T-A-T (two forms), respectively [90].Even if A and G residues in external loops of G-quadruplexes do not form A-GAG tetrads, these quadruplexes could be stabilized by formation of G-A or A-A mismatches.
The G-rich sequence of SV40 L pre-mRNA can also be folded into quadruplexes with GA external loops.Two variants of such structure are presented in Fig. 3, F and G.It should be noted that the putative quadruplexe of SV40 L pre-mRNA GRS (Fig. 3, G) differs from that of thrombine-binding aptamer (Fig. З, A) exclusively in the sequence of the external loops, possessing the same stable UGU central loop.Thus, we suppose that four-stranded structures, particularly G-quadruplexes, may participate not only in such important cellular events, as recombination, transcription and others [65 ], but in the polyadenyla tion process as well.

4 Fig. 1 .
Fig. 1.Database of human poly(A) signals.The pre-mRNAs (DNAs) distribution into groups is explained in the text.Only every second sequence from every group is shown here.The sequences marked by asterisk are included into the groups on condition that the distance between the appropriate hexamer and the core downstream element does not exceed 54 nt.The sequence before blank space is corresponded to the mRNA, the sequence after blank space is corresponded to the У fragment of cleaved pre-mRNA.If the cleavage site was not indicated in mRNA sequences from NCBI database, we took the last non-A nucleotide before the poly(A) tail as 3' terminal.If the several cleavage sites were indicated, the blank space was positioned after the last site.Hexamers I-III are underlined throughout the Figure (they are printed in bold in -10/-70 region).Hexamers IV and V are underlined only within groups IV and V in -10/-35 region.UREs and 2GU/U elements are underlined and printed in bold throughout the downstream region.The following auxiliary elements are underlined: binding sites of Ul A (N-and C-terminal RBDs, the latter is also printed in italic), polypyrimidine and oligo(U) tracts.Also, the G-or C-repeats in the sequences with a potential to form G-quadruplexes or i-motifs are underlined.The binding sites of SRp20 are printed in italic and not underlined.
AUXILIARY ELEMENTS OF MAMMALIAN POLYCA) SIGNALS Fig. 3. G-quadruplex of thrombinbinding aptamer [75] (A); the patterns of pre-mRNA fragments folding into four-stranded struc tures: secretin (B), CD97 antigen (C), stromal interaction molecule 1 (D), dual specificity phosphatase (£), SV40 L (F and C), growth factor, augmenter of liver rege neration (#) , A. The thrombin aptamer modified only in the number of tetrads (three tetrads instead of two) melts at 64.5 °С Thermodynamic stability of G-quadrup lexes with two G-tetrads greatly depends on the sequence and the size of loops which connect Gtetrads [75-77].In the case of the thrombin apta mer, the central loop shortening from TGT to TT significantly decreases the quadruplex stability (T m = = 21.0 °С) Fig. З, B, guanine tetrads in this structure are flanked by A-U base pair.We found 13 pre-mRNAs with a potential to involve the A-U-A-U, G-C-G-C or U • G • U • G tetrads into core quadruplex structure.The possible schematic structures of quadruplexes with the G-C-G-C tetrads or the A-U-A-U tetrad are depic ted in Fig. З, С and D, respectively.Some pre-mRNAs studied using our database are supposed to form quapruplexes capped by triads, the schematic drawing of G-quadruplex flanked by U • (A-A) triad is shown in Fig.
In view of the fact that the interaction between GA loops can stabilize SV40 L pre-mRNA GRS quadruplex, we consider the formation of four-stranded structure in the G-rich region of SV40 L poly (A) signal to be very likely.However, the results of Hans and Alwine[93] who examined the secondary structure of SV40 L pre-mRNA in the region of poly (A) signal by nuclease sensitivity analysis techniques are not in agreement with formation of G-quadruplexes, since some gua nine residues which we suppose to form G-tetrads are highly sensitive to Tl nuclease attack.This dis crepancy could be due to the following reason.The nuclease sensitivity analysis was performed in the presence of 10 mM MgCl 2[93].Though divalent cations are known to stabilize G-quadruplexes at millimolar concentration, but at the concentration about 10 mM they can exert destabilizing effect[74].So, the further investigations are needed to clarify the point whether or not G-quadrupiexes are formed by SV40 L pre-mRNA GRS.Taking into account the fact that hnRNP proteins (H, H' and F) specifically bind to poly(G) at 2 M NaCl [94], when poly(G) is a completely fourstranded macromolecule under equilibrium conditions [95], we recently supposed [5] that these proteins can specifically bind to quadruplexes formed by G-rich sequences of different pre-mRNAs.This bin ding may directly influence the cleavage complex assembly.Besides, G-quadruplexes may play a struc tural role, better exposing poly (A) signal elements to the cleavage factors.Also, G-quadruplexes may pre vent sliding of general polyadenylation factors along RNA, which better tethers them to the binding sites.In conclusion we briefly discuss four-stranded structures formed by C-rich polynucleotide sequences, that are i-motifs in which two cytidine stretches form a parallel-stranded duplex and two such duplexes are associated head-to-tail by base-pair intercalation into a quadruplex [96].To form this structure, the C-stretches must be protonated.Poly(dC) protonation occurs under physiological pH, while poly(C) is ionized under the acidic pH ([97] and refs.therein).However, protonation of C-rich regions of RNAs inside the cell may be performed by proteins (see short review in [98]).Pre-mRNAs containing the sequences with a potential to form intercalated struc tures were found in our database ~1.4 times less often than ones which are capable to form Gquadruplexes. The sequences with the potential to form G-quadruplexes were shown to occur appro ximately 2-fold frequently in the downstream region than in the upstream region.This is completely contrast to the i-motifs occurrence.Analyzing -70/+70 region, we found that 26 pre-mRNA se quences with the potential to form G-quadruplexes and 4 sequences with i-motifs are located downstream of the cleavage site, while 5 sequences with Gquadruplexes and 6 sequences with i-motifs are loca ted upstream.The possible schematic structure of і-motif in GFER pre-mRNA is shown in Fig. З, H. Based on the fact that i-motifs of RNA sequences are much less stable than those of their DNA equivalents [99], we suppose the intercalation struc tures are very likely to participate indirectly in the polyadenylation process (at DNA level).RNA-polymerase II (Pol II) is known to be an essential cleavage factor in the polyadenylation reaction [1-3].Yonaha and Proudfoot [100] showed that G-rich sequences ...TGGCCTTGGGGGAGGGGGAGGC... (which are the binding sites for the transcription factor MAZ) located downstream of polyadenylation signal in syn thetic DNA template pause Pol II.This pausing leads to the stimulation of З'-end processing of Pol II transcript.On the other hand, the pausing induced by a mutant form of EcoRI protein (which is defective in cleavage function but retains high affinity for the wild-type recognition sequence) does not activate polyadenylation when its binding site is inserted in DNA template.In fact, as seen from Fig. 5 shown in [100], the slight activation takes place in this case.In view of the fact that the GRS of the synthetic template may be folded into G-quadruplex, we propose the following mechanism of polyadenylation activation by both G-quadruplexes and i-motifs.Any four-stranded structure will pause Pol II and somewhat stimulate З'-end processing.Specific proteins may stabilize the quadruplexes, thereby increasing the pausing as it probably occurs in the case of MAZ.Also, they may facilitate Pol II function in the cleavage reaction.