Bioinformatic analysis of inverted repeats of coronaviruses genome

Aim. To design the maps of matched and mismatched potential hairpin structures in the genomes of human and animal coronaviruses. Methods. Bioinformatic analysis of coronaviruses nucleotide sequences, atomic force microscopy. Results. Thermodynamically stable matched and mismatched inverted repeats forming hairpin structures that can appear in genomic RNA of the human and animal coronaviruses (severe acute respiratory syndrome virus, murine hepatitis virus, porcine epidemic diarrhea virus, transmissible gastroenteritis virus, and bovine coronavirus) are determined. The maps of hairpin localization (which are a part of the genome signaling mechanisms) are obtained for the genome of coronaviruses. Conclusions. The genes encoding replicase and spike glycoproteins of coronaviruses are the main sites of the localization of potential conservative structural motives. The hairpins are shown to be conservative structural elements inside the set of coronavirus isolates of one species.

In tro duc tion.Sim i lar to other non-ca non i cal for mations, hair pin-loop struc tures, which can be formed in nu cleic ac ids by in verted re peats, are sig nif i cant genome el e ments, play ing a spe cific bi o log i cal role.It is be lieved that they are in volved in the reg u la tion of DNA rep li ca tion and transcription [1,2].
The method of flu o res cent flow cytometry al lowed deter min ing ~10 5 hair pin-loop struc tures in the nu cleus [3].
Be sides a spe cific role of matched and mis matched in verted re peats in mu ta gen e sis, they are as so ci ated with a se ries of hu man ge netic dis eases (he red i tary angioneurotic edema, antithrombin de fi ciency, and defi ciency of hu man se rum cholinesterase) [4].
Re gard less of rather in tense study on pal in drome lo cal iza tion in ge nome of dif fer ent or gan isms, the role and dis tri bu tion of hair pin struc tures in ge nome of viruses and bac te ria are still to be de ter mined.There fore, we have searched for po ten tial hair pin struc tures in coronaviruses ge nome.The fam ily of coronaviruses (CoV) in cludes por cine ep i demic di ar rhea vi rus, in fectious bron chi tis vi rus, murine hep a ti tis vi rus, and transmis si ble gastroenteritis vi rus.There are also coronaviruses of hu mans, cat tle, horses, and cats.Sequenc ing proved that se vere acute re spi ra tory syndrome vi rus (SARS-CoV) may also be re lated to Nidovirales or der, Coronaviridae fam ily, Coronavirus genus.
Coronaviruses are di vided into three serogroups, each hav ing cross serological re ac tiv ity and sim i lar genome or ga ni za tion.All known hu man coronaviruses be long to groups I and II.SARS-CoV forms a new group IV, since [the] per formed ge netic and an ti genic re searches dem on strated its be ing dis tant from all the known groups of coronaviruses [5].
[The] Cur rent work pres ents dis tri bu tion of matched and mis matched in verted re peats in the genome of cer tain coronaviruses, highly dan ger ous for hu mans, [the] se vere acute re spi ra tory syn drome vi rus, in par tic u lar.An anal y sis of the maps of po ten tial hairpin struc tures showed that the dis tri bu tion of in verted re peats is the same within the set of coronavirus isolates of one species.
There fore, the com par i son of the dis tri bu tion of hair pin struc tures may serve as an other in stru ment (along with philogenetic anal y sis) in the re search of evo lu tion ary re la tions and ge nome or ga ni za tion of both coronaviruses and rep re sen ta tives of other species.
Oligo (ver sion 3.4) [6] and RNA2 of GeneBee pack age [7] were used to search for matched and mismatched in verted re peats and to de ter mine their thermo dy namic pa ram e ters.
Atomic force mi cro scope (AFM) Nanoscope III with D-scan ner (Veeco In stru ments Inc., USA) was used.AFM im ages of the sam ple of supercoiled pUC8 plasmid DNA (2665 bp) af ter ap pli ca tion on stan dard amino amica were cap tured in the air in "height" mode us ing tap ping vari ant of AFM and un sharp ened probes of KTEK In ter na tional com pany (Rus sian Fed er a tion) with res o nance fre quency of 300-360 kHz.The sam ple was pre pared ac cord ing to the method, previously described in [8].
Re sults and Dis cus sion.It has been re vealed that hair pin-loop struc tures may be a part of pro mot ers and tran scrip tion ter mi na tion sites as the pres ence of cruciforms is a sig nal for the stop of RNA-poly mer ase and ter mi na tion of syn the sis of RNA-tran scripts with sub se quent dis so ci a tion of the com plex, formed by RNA-poly mer ase and DNA-RNA-tran scripts.One of the men tioned tran scrip tion ter mi na tors for T7 RNA-poly mer ase is the site of tran scrip tion ter mi na tion of pGEMEX plasmid which is an in ter nal tran scrip tion ter mi na tor, ~ 90 bp long, with the ef fi ciency of 70-80% [9].The anal y sis of the site of tran scrip tion ter mi na tion of pGEMEX DNA for the pres ence of ther mo dy nam ically sta ble in verted re peats al lowed us to find a mismatched in verted re peat of 28 bp long, with the free energy -DG = 11.2 kcal/mol.The ter mi na tion of T7 RNA-poly mer ase tran scrip tion with elon ga tion of transcrip tion on pGEMEX DNA ma trix, con tain ing this inverted re peat in the site of the ter mi na tor, has been previ ously dem on strated in vi tro [10].There fore, tak ing into con sid er ation the pa ram e ters of pGEMEX DNA hair pin and lit er a ture data con cern ing the pa ram e ters of hair pins in hair pin-loop struc tures, ob served in the course of in vivo [11] and in vi tro [12] ex per i ments, hair pins with the loop length of 5 nu cleo tides and min imal en ergy -DG ~ 9 kcal/mol were se lected for fur ther anal y sis.
The di a gram of their dis tri bu tion on the phys i cal map of SARS ge nome (Fig. 1) was built on the ba sis of de ter mined po ten tial (i.e.ther mo dy nam i cally sta ble) hair pins in SARS vi rus ge nome (Ta ble).It is note worthy that the hair pins de ter mined are con ser va tive structural mo tives for SARS vi rus.The com par i son of their lo cal iza tion on ge nome of sev eral SARS iso lates showed that their lo ca tion is the same for the ma jor ity of hair pins.In our opin ion, it may serve as ev i dence to a spe cific role of hair pin struc tures in the chain of signalling mechanisms of SARS virus functioning.
All the re peats an a lyzed were di vided into two types -matched and mis matched ones (the stem of lat -ter con tains non-com ple men tary nu cleo tides or de letions of nu cleo tides in one of the chains of hair pin stem).Be sides, the re peats were dif fer en ti ated into three groups ac cord ing to their en ergy level.Matched and mismatched thermodynamically stable hairpin-like structures, which may possibly be formed by inverted repeats, in genomic RNA of severe acute respiratory syndrome virus (number AY291451) for GenBank database sec ond, and third groups con sisted of re peats with the en ergy (-DG) of 10-15, 15-20, and 20 kcal/mol, respec tively.
The se quence and sec ond ary struc ture of two typ ical mis matched in verted re peats, the en ergy of which ex ceeded -15 kcal/mol, are pre sented in Fig. 2. We used the same method to ob tain the di a grams of dis tri bu tion and pa ram e ters of hair pin-like struc tures for bo vine coronavirus (Fig. 3), murine coronavirus (Fig. 4, a), porcine ep i demic di ar rhea vi rus (Fig. 4, b), and trans mis sible gastroenteritis vi rus (Fig. 4, c).
It should be men tioned that ap prox i mately two thirds of coronavirus ge nome is a ma trix for the syn thesis of replicases 1A and 1B, one third of the ge nome encodes struc tural pro teins (nucleoprotein, spike glicoproteins S, M, and E) as well as a se ries of non-struc tural pro teins (Fig. 1).
Among an a lyzed se quences of an i mal coronavirus iso lates and SARS vi rus, the high est sus cep ti bil ity to form ing hair pin-like struc tures was re vealed for SARS vi rus, and a pos si bil ity of form ing up to 26 struc tures for one iso late was dem on strated (Ta ble).The ma jor ity of hair pin-like struc tures (20 out of 26) can be formed in the genes, en cod ing replicase (Fig. 1).
The au thors of [13] used com puter mod el ling (the pro gram, pre dict ing sec ond ary RNA struc ture) to inves ti gate the sec ond ary and ter tiary struc tures of 3'-untrans lated re gion (UTR) of ge nome RNA of SARS-CoV, struc tural el e ments of which play a sig nif icant role in rep li ca tion of vi ruses, [as] com pared to It should be men tioned that the struc ture of a hairpin, de ter mined by com puter anal y sis, de pends on both the search al go rithm used and the hair pin pa ram e ters.There fore, con trary to the au thors of [13], we con centrated our ef forts on the search for ther mo dy nam i cally sta ble matched and mis matched re peats, i.e. the ones, ac tu ally ob served in pre vi ous ex per i ments.We did not con sider mis matched re peats with more than two pairs of non-paired nu cleo tides and with the size of a loop exceed ing six nu cleo tides.The ex am ple of such ther mody nam i cally sta ble hair pin is a hair pin-loop struc ture, formed by two hair pin struc tures in supercoil pUC8 plasmid (Fig. 5).pUC8 plasmid con tains sev eral inverted re peats which can form hair pin-loop struc tures.Free en ergy DG of the most sta ble struc ture (in di cated with ar rows in Fig. 5) is -17.8 kcal/mol, 11 bp form the  stem of the hair pin, and the loop con tains four nu cleotides.How ever, G-T pair is not con sid ered as non-comple men tary one in RNA2 pro gram of GeneBee software, used by us to pre dict the sec ond ary struc ture of coronavirus RNA.The for ma tion of G-T Watson-Crick pair is pos si ble due to the for ma tion of rare tau to mer enol and iminoforms of nu cleo tides [14].
[The] Pre sented AFM im age of hair pin-loop struc ture of pUC8 DNA dem on strates that among sev eral pal indromes, which are a part of pUC8 DNA, the only one and the most sta ble ther mo dy nam i cally, hair pin-loop struc ture is formed in vi tro.
The pos si bil ity of for ma tion of 23 ther mo dy nam ically sta ble con ser va tive struc tural mo tives was shown for genomic RNA of bo vine coronavirus (Fig. 3).The lo ca tion of the ma jor ity of these mo tives co in cides with the lo ca tion of hair pins for an other isolate of bo vine coronavirus (in di cated in Fig. 3) sim i lar to iso lates of SARS vi rus.
The anal y sis of genomic RNA of murine hep a ti tis vi rus (Fig. 4, a) al lowed re veal ing 12 hair pin-like structures in this se quence.The in ves ti ga tion of genomic RNA of por cine ep i demic di ar rhea vi rus proved the pos si bil ity of form ing 18 hair pins (Fig. 4, b), 10 of which are lo cated in the site of the gene, encoding replicase.
The in ves ti ga tion of a com plete se quence of genomic RNA of an other por cine coronavirus -transmis si ble gastroenteritis vi rus -tes ti fies to the pos si bility of ex is tence of 11 hair pin-like struc tures, nine of which are also lo cated in the gene, en cod ing replicase (Fig. 4, c).
The com po si tion of lo cal iza tion maps of in verted re peats brings up sev eral ques tions.Firstly, two hairpins at 5'-and 3'-ends of DNA ma trix chain are suf ficient for ini ti a tion and ter mi na tion of tran scrip tion.At the same time, the se quence of gene, en cod ing replicase A of bo vine coronavirus, con tains 14 hair pins.Therefore, a bi o log i cal func tion of the ma jor ity of hair pins re -vealed is yet to be de fined.Sec ondly, the ab sence of hair pins at 5'-end of the gene of replicase A of SARS virus (Fig. 1), bo vine coronavirus (Fig. 3) tes ti fies to the fact that sim i lar[ly] to hair pins, other non-ca non i cal DNA struc tures (tri plexes, in par tic u lar) may serve as sig nals for en zyme bind ing.Thirdly, SARS vi rus differs from other coronaviruses both in qual i ta tive charac ter of dis tri bu tion of hair pins and in their quan ti ta tive pa ram e ters.For in stance, the num ber of highly sta ble mis matched in verted re peats with the en ergy -DG over 15 kcal/mol for SARS vi rus is seven, while only two repeats with the free en ergy -DG over 20 kcal/mol and one re peat with the en ergy over 15 kcal/mol were found for bo vine coronavirus.The abovementioned tes ti fies to the pos si bil ity of us ing the dis tri bu tion of ther mo dynam i cally sta ble in verted re peats for the pur pose of struc tural dif fer en ti a tion of vi ruses.

Fig. 1 .
Fig.1.Physical map of severe acute respiratory syndrome virus (number AY291451) with indicated locations of known genes.Arrows indicate the location of determined thermodynamically stable matched and mismatched hairpin structures; asterisks indicate hairpin structures, the location of which coincides with that of similar structures of another isolate of SARS virus (number AY278488); b, c,mismatched hairpins with the energy -DG over 10 and 15 kcal/mol respectively; ¯ -matched hairpins with the energy over 10 kcal/mol.

Fig. 2 .
Fig.2.Secondary structure, formed by inverted repeats in the fragment of severe acute respiratory virus (number AY291451) (complementary sequences are shown in bold, flanking sequences are shown in italics): a -two mismatched inverted repeats with length of 32 and 34 nucleotides (vertical lines show their symmetry centres, arrows indicate their orientation); b -hairpin structures, corresponding to inverted repeats N 16 and N 17, parameters of which are presented in Table.

Fig. 3 .
Fig.3.Physical map of bovine coronavirus (number NC_003405) with indicated locations of known genes.Arrows indicate the location of hairpin structures; asterisks indicate hairpin structures, the location of which coincides with that of similar structures of another isolate of bovine coronavirus (number AF220295); question mark corresponds to hairpin structures, the location of which does not coincide for two mentioned isolates of bovine coronavirus; b, c, Ý -mismatched hairpins with the energy (-DG) over 10, 15, and 20 kcal/mol respectively; ¯matched hairpins with the energy over 10 kcal/mol.

Fig. 4 .
Fig.4.Physical maps of murine hepatitis virus (NC_001846) (a), porcine epidemic diarrhea virus (number NC_003436) (b), and transmissible gastroenteritis (number NC_002306) (c) with indicated location of known genes.Arrows indicate the location of hairpin structures; b, c -mismatched hairpins with the energy over 10 and 15 kcal/mol, respectively; ¯ -matched hairpins with the energy over 10 kcal/mol.

Fig. 5 .
Fig.5.AFM image of supercoiled pUC8 plasmid in the air.Image size -372 nm x 372 nm.Arrows indicate two hairpins, forming hairpin-loop structure.The insert size with zoomed image of hairpins is 55 nm x 62 nm.