The search for microRNA genes in the regions of two very late genes of Bombyx mori nuclear polyhedrosis virus

Aim. B. mori nuclear polyhedrosis virus (NPV) codes two very late genes – polyhedrin (ph) and p10. Search for miRs genes in these regions is of interest because the polyhedra, formed at the very late stage of the virus development, include small RNA of 50–60 nt. The present work was aimed at search for potential precursors of miR transcribed from the late promoter element RTAAG and the TATA promoter elements located in the ph and p10 genes regions. Methods. The search was performed using the bioinformatic programs for miR prediction: MiPred, miRNA SVM, Micropocessor SVM, and RNAfold. Results. It has been predicted that the region of ph gene encodes two predicted miRs (bmoNPV-miR-1ph, bmoNPV-miR-2ph) and one predicted potential (C) precursor bmoNPVpre-miR-1Cph, which is not a Dicer substrate. The region containing p10 gene encodes one predicted miR – bmoNPV-miR3p10. Conclusions. A possibility of regulation of the genes orf 1629 and p74 expression by the predicted miRs, located in the same regions of a complementary chain, is assumed.

Introduction.MicroRNA (miRs) are among three most prevailing classes of small non-coding RNAs of 20-30 nucleotides (miRNAs, siRNAs, piRNAs), initiating RNA-interfering.miRs are bioregulators of gene expression in eukaryotic cells.The biogenesis, functioning, biochemical and bioinformatic approaches to miRs study, their participation in the regulation of various cell processes as well as their relation to some pathology have been previously described in [1].siRNA, piRNA, and other small non-coding RNA are described in [2].Besides eukaryotes, miRs are also revealed in viruses, in particular, in large DNA-containing ones [3].Among RNA-containing viruses miRs were found in human immunodeficiency virus [4,5].However, little is known about miRs role in the virus-cell interrelations.A few experimental articles and reviews on this problem have been published [6][7][8].
Baculoviruses are at trib uted to the class of large DNA-con tain ing vi ruses.Nu clear polyhedrosis vi ruses (NPV) are an in de pend ent serological group of baculoviruses, virions of which in te grate into the in clusion bod ies -poly he dra -at the very late stages of the vi rus de vel op ment.Poly he dra-form ing pro tein (polyhedrin) is the prod uct of one of two very late genes.The sec ond gene, p10, en codes pro tein p10.The ex pres sion of both genes is ini ti ated by the late promoter elements -A/G/T/ TAAG [9].
Search for miR pre cur sors (pre-miR) in RNAs, tran scribed from two very late pro mot ers of ge nome of B. mori NPV is of in ter est be cause the poly he dra of B. mori NPV, formed at the very late stage of the vi rus devel op ment, in clude not only virions, but also small RNA of 50-60 nu cleo tides [10].This al lowed us suggest their be ing pre-miR as these mol e cules are known to be of 50 -100 nu cleo tides.Pre dicted pre-miR, included into poly he dra, is most likely to be pro cessed from the very late tran scripts and seized by polyhedrin in the pro cess of poly he dra for ma tion.Both mRNA of polyhedrin and p10 are at trib uted to these very late tran scripts [9].It is also pos si ble that poly he dra may include ei ther pre-miRs, pro cessed from other late transcripts, or host pre-miR-let7.The rise of miR-let7 synthe sis at the stage of larva trans for ma tion into pupa was ob served by the au thors of [11].We used ex actly this stage of the in sect de vel op ment to iso late poly he dra for their in ves ti ga tion (co coons, con tain ing dead lar vae).Fur ther bio chem i cal in ves ti ga tion on RNA from polyhe dra would help clearing out which small RNA is included into polyhedra.
The cur rent work pres ents the re sults of bioinformatic ap proach to the search for pre-miRs and miRs not only in the tran scripts, syn the sized from the TAAG-pro moter el e ment for two very late pro teins, but also in al ter na tive tran scripts (alts), syn the sized from the pre dicted TATA pro moter el e ments, lo cated in the ph and p10 genes re gions of the B. mori NPV genome.
Ma te ri als and Meth ods.Among ex ist ing programmes for microRNA predic tion we se lected the ones, the al go rithm of which does not have the cri te rion of con ser va tism, since vi ral microRNAs are not con ser va tive in con trast to microRNA of eukaryotes.The sec ond ary struc ture of alts (hy po thet i cal pri mary tran scripts -h-pri-miR) was in ves ti gated us ing RNAfold programmes (http:// rna.tdi.univie.ac.at/cgibin/RNAfold.cgi)[12].The programme of pre dict ing and pro cess ing pri-miR was used to search for al ter na tive tran scripts of sls (stem-loop struc ture) of 48-150 nt, which are Drosha and Dicer substrates (https://demol.interagon.com/miRNA/).Predicted pre-miR and ma ture miR were con sid ered as substrates with the score, ex ceed ing the in ter sec tion of curves of sen si tiv ity (Se) and spec i fic ity (Sp) ->-0,55 [13].The hair pins, pro cessed by Drosha, but not processed by Dicer, were con sid ered to be can di date (C).
The nu cle o tide se quences of hair pin struc tures, revealed in the al ter na tive tran scripts, were also stud ied us ing RNAfold programme.The pro cessed hair pins were con sid ered as sls with the value of free en ergy fold ing of -23.0 kcal/mol [12] or (in terms of kilojoules) -96.6 kilo joules/mol.Search ing for miR in B. mori ge nome us ing RNAfold programme, Tong et al. [14] se lected the value of free en ergy "ex ceed ing 105 kilo joules/mol" as a "fil ter".The value, ac cepted by us, was 100 kilo joules/mol.The search for real and pseudo pre-miR was per formed us ing miPred programme (http://www.bioinf.seu.edu.cn/miRNA/in dex.html) [15].
The search for ma ture miR in the pre dicted pre-miR was per formed us ing miRscan programme (http:// genes.mit.edu/miRscan/) [16].The nu cle o tide sequence of the in ves ti gated pre-miR was in tro duced in miRscan as the first and sec ond sequences.
Re sults and Dis cus sion.The late tran scrip tion in baculoviruses is ini ti ated by TAAG-pro moter el e ment and ter mi nated by polyT-se quence [9].It is known that three polyhedrin tran scripts of 1,16; 3,4, and 4,9 thousand b.p. [17] and two p10 tran scripts of 0.75 and 2.5 thou sand b.p. [18] are syn the sized in Autographa californica NPV.There are no sim i lar data re gard ing B. mori NPV.Since B. mori NPV is a genotypic vari ant of A. californica NPV, it is pos si ble to as sume the same sit u a tion for the for mer.This was our ba sis for de termin ing the bound aries of ge nome re gions of B. mori NPV for the search of miRs.The re gions, con tain ing only two pre dicted polyhedrin tran scripts (1.16 and 3.4 thou sand b.p.) and both p10 tran scripts, were se lected for the in ves ti ga tion.This se lec tion was con di tioned by the fact that tran scripts of 1.16 and 3.4 thou sand b.p. cover gene orf 1629, and the tran script of 2.5 thou sand b.p. -gene p74, lo cated on the com ple men tary chain.Polyhedrin tran script of 4.9 thou sand b.p. was not inves ti gated since it goes be yond the se lected re gion.The lo ca tion of the de fined polyhedrin re gion in the ge nome of B. mori NPV is 128298-3404, and that of p10 re gion -108411-110961.If A in AUG codon is taken for the ref er ence point, these re gions are -116-3404 (here in after ph) and -86-2565 (here in af ter p10), re spec tively.Tran scripts of 1.16 and 3.4 thou sand b.p. cor re spond to tran scripts -51-1129ph and -51-3404ph; two p10 transcripts -to transcripts -71-630ð10 and -71-2565ð10.
As shown, the sec ond ary struc ture of -51-1129ph tran script con tains two stem-loop struc tures, one of which (slsph) is pro cessed into the ma ture miR, and the other (sls2ph) does not pass the fil ters of the programmes, used.The sec ond ary struc ture of the second tran script -51-3404ph con tains 12 hair pins, sls1ph among them.Among the re main ing stem-loop structures, three do not pass the programme fil ters and eight are pro cessed only into pre-miRs.Since the fig ures of sec ond ary struc tures are too lengthy, they are not presented in the cur rent work, while sls char ac ter is tics will be con sid ered fur ther with re gard to the dis cus sion of al ter na tive tran scripts.-51-1129ph is likely to translate into polyhedrin [17].
There fore, ac cord ing to our pre dic tion the transcript -51-3404ph may be h-pri-miR.A sim i lar sit u ation is ob served for two tran scripts of -86-2565ð10 region.Our data dem on strate that the sec ond ary struc ture of a smaller tran script (0.75 thou sand b.p.) con tains the only sls1, pro cessed into miR.Be sides sls1, a larger tran script (2,5 thou sand b.p.) con tains six hair pins, which are pro cessed in pre-miR, but do not pass the filters of other programmes.Sim i lar to tran script -51-3404ph, tran script -71-2565ð10 may act as h-pri-miR.
Logics of the ap proach to the search for miR in alter na tive tran scripts are pre sented be low.All the ex isting programmes of pre dict ing can di date pre-miR are based on the search for some stem-loop struc tures, corre spond ing to spe cific re quire ments.In re al ity pre-miR hair pins are pro cessed from the pri mary tran scriptspri-miR.The search for h-pri-miR among alts is compli cated, be cause pro mot ers, from which pri-miR transcrip tion is ini ti ated, are not de ter mined ex actly, though pri-miRs are known to be tran scribed by RNA-polymer ase II from TATA-pro mot ers mainly [19].The transcrip tion of pri-miRs may take place from other sequences as well [20].We de cided to start the search for miRs from the pre dic tion of h-pri-miRs among var i ous alts.The bound aries of al ter na tive tran scripts were deter mined from pre dicted promoters TATA to polyT (at least four T) sequences.
The re gion -116-3404ph con tains six pre dicted pro mot ers and 33 polyT-ter mi nat ing se quences, and the re gion -86-2565ð10 -nine pro mot ers and 19 polyT-se quences.These data were the ba sis for our inves ti ga tion of 148 alts-ph and 114 alts-p10.148 alts-ph con tain 19 unique sls-ph.Then unique al ter na tive transcripts were se lected ac cord ing to the fol low ing prin ciple: be sides the re quired hair pin, alt, min i mal in size, should con tain a min i mal num ber of other hair pins.Only 11 unique tran scripts out of 148 alts-ph con tained all 19 sls-ph, and 16 unique ones out of 114 alts-p10 con tained 21 sls-p10.All the re sults of in ves ti ga tion of sls char ac ter is tics are presented in Tables 1 and 2.
Fig. 1 pres ents sec ond ary struc tures of three h-pri-miRs.All three h-pri-miRs con tain two sls each.show localization of alt, containing the corresponding sls, with regard to the reference point and the position of the start nucleotide of sls, respectively.Columns 5 and 8 show the estimation of sls Drosha and Dicer substrates, respectively, in accordance to the programme of predicting and processing pri-miR [13]; "no" means absence of processing centres in the corresponding sls.In column 6 indications «+» -«real» hairpin, «-» -hairpin «cannot be real» is in accordance to miPred programme [15].Column 7 shows the values of free energy of folding of sls in accordance to RNAfold programme [12].The characteristics of sls, which passed the filters of corresponding programmes, are shown in bold.

Table 1. The characteristics of stem-loop structures (sls) in alternative transcripts (alt), synthesized from region 128298-3404 of the B. mori NPV genome encoding mRNA of polyhedrin
Al though sls2-ph is Drosha and Dicer sub strate, it is not real, and does not pass the fil ter of fold ing free en ergy (Ta ble 1).As for sls9-ph and sls3-p10, they are Drosha sub strates, but they do not pass the fil ters of other programmes.There fore, we in di cated h-pri-miRs, presented in Fig. 1 as h-pri-miR-1ph, h-pri-miR-2ph and h-pri-miR-3p10.Fig. 2 pres ents the sec ond ary struc ture of h-pri-miR-1Cph.Sls11 is pro cessed to ma ture miR-2ph, and sls12 -to can di date pre-miR-1Cph.Fig. 3 dem on strates three sls, pro cessed to ma ture miRs, and one sls, pro cessed to candidate pre-miR-C.
Us ing the de vel oped programme of pre dict ing vi rus miRs (Vir-Mirdb), the au thors of [21] re vealed 11 pre-miR in the plus-strand of the ge nome of B. mori NPV, from which 22 miRs are cut out (one miR from each shoul der of pre-miR).It is hard to agree to these data as a ma ture miR is usu ally cut out from one 5'-shoul der.Be sides, all miRs, pre dicted by the au thors of [21], con tain 26 nu cleo tides each, while miRscan has ap par ently the length of miRs, equal to 21 nu cleo tides, which is closer to the length of miRs in vivo.these au thors and us; nev er the less, they have not revealed pre-miR-1ph, pre-miR-2ph and pre-miR-3ð10, pre dicted by us.How ever, they found a hair pin, which suits pre-miR-1Cph, pre dicted by us.Con trary to their data, this pre-miR-1Cph is not pro cessed by Dicer as Table 1 (sls12) demonstrates.
It is note wor thy that while miR-1ph, miR-2ph and miR-3p10 are the only rep re sen ta tives among h-pri-miRs, pre dicted by us, they are also found in all in ves ti gated alts, con tain ing the re gions of their lo caliza tion.It is pos si ble that pre dicted ma ture miRs -miR-1ph, miR-2ph and miR-3p10 -will also be present, and there fore, pro cessed in other un known real alter na tive tran scripts, syn the sized in the cell (not only from pro mot ers TATA and TAAG to polyT-se quence).miRs-ph, pre dicted by us, are com pletely com ple mentary to mRNA orf 1629, and miR-3p10 -to mRNA p74.There fore, if these miRs ex ist, they should func tion sim i lar to si-RNA.In such case mRNA should split in the re gions, com ple men tary to the pre dicted miRs.Since miRs-1ph and miR-3p10 are com ple men tary to 3'-UTR mRNA îrf 1629 and ð74, re spec tively, the partic i pa tion of these miRs in the reg u la tion of ex pres sion of genes orf 1629 and p74 t may be assumed.
Sim i lar sit u a tion is pos si ble for of A. californica NPV.As shown in [17], a tran script of 3.2 thou sand b.p. is syn the sized from the com ple men tary chain in the region of A. californica NPV, con tain ing the polyhedrin gene.It con tains two open read ing frames (orf 1629 and orf 603) and cov ers the polyhedrin gene.The syn the sis of tran script starts prior to mRNA of polyhedrin and it van ishes with the ap pear ance of polyhedrin, though its frag ments are still observed.
In 1990 the au thors ex plained this phe nom e non by three rea sons: 1) de struc tion of pro moter com plexes from the 3'-end of polyhedrin gene by RNA-poly merase, tran scrib ing polyhedrin; 2) for ma tion of double-stranded RNA from polyhedrin mRNA; 3) neg a tive reg u la tion of pro moter orf 1629 by polyhedrin.At that time they could not as sume the par tic i pa tion of microRNA in this pro cess, as microRNAs were dis covered only in the be gin ning of this cen tury.We as sume the par tic i pa tion of miR, en coded in polyhedrin gene, in the reg u la tion of syn the sis of the tran script, con tain ing orf 1629 and orf 603.In this case tran script of 3.2 thousand b.p. will be split.We plan fur ther in ves ti ga tion on the de tec tion of miRs in the re gion of the ge nome of A. californica NPV, con tain ing polyhedrin genes orf 1629 and orf 603.
229THE SEARCH FOR microRNA GENES IN THE REGIONS OF ph AND p10 GENES OF Bombyx mori NPV N o t e.Both here and in Table2column 1 presents numbers of sls in the order of less proximity to the reference point.Columns 2 and 3

Table 2 .
The same source of the ge nome nu cle o tide se quence was used by The characteristics of stem-loop structures (sls) in alternative transcripts (alt), synthesized from region 108281-110842 of the B. mori NPV genome encoding mRNA of p10