The start of systems biology in Ukraine

The first laboratory of Systems Biology in Ukraine (IMBIG NASU) represents a track record of its scientific results. They include the pioneered development of a web-based tool for genome-wide surveys of eukaryotic promoters for the presence of transcription factors binding sites (COTRASIF); the deciphered mechanisms of the fine-tuned and balanced response of primary hepatocytes to interferon alpha levels recorded after partial hepatectomy; the elaboration of a novel method of gene regulatory network inference compatible with GRID environment and the development of a stoichiometric model of folate-related one carbon unit metabolism in human placenta and its application for the characteristics of the system’s behavior as a whole at different human pathologies.

The beginning of XXI century was marked by the emergence of the new branch in biological sciences -Systems Biology that is defined as the quantitative analysis of the dynamic interactions between the components of a biological system and aims at understanding the behaviour of the system as a whole, as opposed to the behaviour of its individual constituents [1].Systems Biology recognizes that: the overall systemic features differ from the collective features of its constituents; each system has subsystems being itself a subsystem of the system of higher order; the system has a hierarchic structure; it is a dynamic formation, which changes during its functioning and evolution.Systems Biology uses the high-throughput technologies of molecular biology, mathematics, physics, informatics and engineering.It handles the massive amounts of data; its measurements must be as quantitative as possible -for the ultimate integration and modeling this information is a quantitative process; it combines discovery-driven and hypothesis-driven approaches to systems studies; the modeling is the ultimate objective of systems biology that helps to understand the mechanistic underpinnings of particular biological systems phenomena that emerge from the integrated operation of the information components (emergent properties) [2].Systems Biology promotes the close cooperation of scientists from different areas of science and technology.The promises of Systems Biology are great -from creation of virtual organs to virtual man.It will affect many areas of the biomedical sciences from drug design to medical treatment going from the statistically average for typical patient to the individual with his/her specific geno-and phenotype.

THE START OF SYSYEMS BIOLOGY IN UKRAINE
In 2009 the first laboratory of Systems Biology in Ukraine (Prof.Maria Obolenskaya) was organized in the Institute of Molecular Biology and Genetics of National Academy of Sciences of Ukraine.Herein we present the initial results obtained by our small group.
COTRASIF: conservation-aided transcription factor binding site finder.The transcriptional regulation of gene expression relies on the effects of the transcription factors (TFs) bound to specific regulatory elements, transcription factor binding sites (TFBS) in the promoters of the genes, activating or repressing the corresponding gene.While experimental identification of TFBS within single gene promoters is common, with no prior information this process is both effort-and time consuming.Computational prediction of TFBS marks the potential targets for further experimental verification, and presents self-sufficient data on the gene regulation patterns, associated with each specific TF.
We have developed an easy to use web-based tool for genome-wide surveys of eukaryotic promoters for the presence of TFBS [3].The most common way of representing conserved sequences (such as TFBS) is to use consensus strings (built using IUPAC -International Union of Pure and Applied Chemistry -nomenclature) but these strings contain only a small portion of the information, available from the set of initial sequen-ces.Instead, position frequency matrix (PFM) consists of each of four possible nucleotide counts per each position of the identified binding site.Position weight matrix (PWM) represents the complete nucleotide occurrence probabilities for each position using position weight matrices and information content evaluation of each matrix position [4][5][6][7].It also allows the quantification of the similarity between the weight matrix and a potential TFBS detected in the target sequence [6,[8][9][10].However, even the matrix representation is prone to information loss: it is perfect for recording single nucleotide frequencies, but higher order groupings (like pairs or triplets of nucleotides) are not accounted for.Joining hidden Markov models (HMMs) to PFM and PWM helps to preserve this kind of information as well and yields the method of storing full TFBS characteristics.
Taking into account the length of matrices used for TFBS prediction (usually less than 15 nucleotides), the search may yield numerous false-positive results, occurring by chance.To reduce their number, and avoid the loss of sensitivity additional processing of results -gene orthology information [11,12], was applied.
Therefore, a new web-based tool, called CO-TRASIF, allows conducting genome wide searches of the user-specified TFBS or sequence sets, and provides an option to filter the list of results based on evolutioary conservation [13].COTRASIF allows using either PFM, or a set of sequences.When a set of sequences is provided, the hybrid HMM-PWM search method is used instead of the PWM-only search method.CO-TRASIF contains its own, regularly updated database of eukaryotic promoters (automatically imported from Ensembl genome annotations), integrated JASPAR CORE database of high-quality TFBS matrices and evolutionary conservation filter.Therefore COTRASIF is a fully integrated solution which is freely available at http:// biomed.org.ua/COTRASIF/.There is an extensive online help available for the tool.
COTRASIF application.COTRASIF has become a popular tool all over the world that is clearly demonstrated by the Table 1.
Using COTRASIF we identified and analyzed the rat genes of the potential primary response to interferon alpha (IFNa).The dominating IFNa transduction pathway is realized via ISGF3 transcription factor (interfe- ron-stimulated gene factor 3), which binds specifically to ISRE (interferon-stimulated response element) [14].Thus, searching for ISRE has revealed 162 genes with putative conserved ISRE among 17725 protein-coding genes of Rattus norvegicus.Using web-based tool Fa-tiGO [15] we have predicted previously unknown potential targets of IFNa referring to the complement system and components of synapse.The experimental verification of this prediction is currently in progress.Gene expression profiling.Gene expression profiling is the measurement of expression of thousands of genes at once, to create a global picture of transcriptome and ultimate system level information for inference of gene regulatory network.There are two principal approaches used for the gene expression profiling -based on hybridization of RNA sample with previously identified target genes attached to a solid surface (microarray) and sequence based technique (e. g.RNA-seq).
We have analyzed the gene expression profile in primary rat hepatocytes isolated from intact liver and treated during 3 h and 6 h with IFNa (250 U/ml) [16].The early up-regulated expression, specific antiviral activity and activated intracellular IFNa signaling were recently discovered in regenerating rat liver [17][18][19][20] against a background of different other manifestations of the innate immune response [21][22][23].To approach the problem we mimicked in vitro the quantitative and temporal local levels of IFNa after partial hepatectomy (PH) and delineate the response of primary hepatocytes to such treatment.
The gene expression profile was analyzed with GeneChip® Rat Genome 230 2.0 Array (Affymetrix).It provides comprehensive coverage of the transcribed rat genome on a single array, comprised of more than 31000 probe sets, analyzing over 30000 transcripts and variants from over 28000 well-substantiated rat genes.
The analysis of gene expression profiles after 3 hand 6 h-incubation of hepatocytes with IFNa revealed 24 and 128 differentially expressed up-regulated genes, a smaller number than that induced by longer treatment with higher doses of IFNa, but only partly overlapping with them.In total, they differ in that the response is smaller in magnitude, that there is a simultaneous expression of activators and inhibitors of transcription and intracellular signaling, co-regulators of transcription factors are involved, as well as several specific modifi-ers of proteins.As such the response is cell and situation specific [24,25] and temporally regulated.
The sequence of how the differentially expressed genes enter the process is of principal biological importance.At the early stage the cells exploit the strategic advantage of the energy saving principle by using available resources and protein changes to get the fastest response: autophagy to eliminate unneeded cellular materials; reversible ISGylation of preexisting proteins exceeding other types of protein modification; partial inhibition of the energy consuming translation; co-activation of preexisting nuclear receptors to provide faster changes in gene expression in comparison with other transcription factors acting as target proteins of slower responding signaling cascades; switching the mitochondrial dNTP synthesis to the salvage pathway and, lastly, the «insistent appeal» to other hepatic cells for their involvement in the process.
The further development of the response is associated with a more intensive nuclear control of gene expression via up-regulated STAT1, STAT2 and IRF9 against the background of activated preexisting Jak/ STAT, PI3K/Akt and p38/MAPK pathways; enlargement of the list of differentially expressed genes and increasing the response of previously expressed genes (in most cases) and involving restraining factors.The simultaneous expression of proinflammatory and restraining factors seems to be a distinguishing feature of the hepatocyte response to a specified IFNa impact.The propagation of IFNa signaling via activated transmission from MAVS (mitochondrial antiviral signalling protein) to TBK1 (TANK-binding kinase 1) may be opposed by inhibition of RIG1 (Retinoic Acid Inducible Gene 1 Protein) and its pathway; T cell activation by CXC ligands and MHC complexes by T cell inhibition via negative co-receptor PD-1; initiator caspase 8 by the most potent cellular inhibitor of apoptosis, XIAP [16].
Thus, the realization of protein activities encoded by differentially expressed genes would affect all regulatory levels of gene expression and provide a time-dependent, strictly balanced and energy sparing response.We hypothesize that elevated expression of IFNa at the beginning of regeneration process may induce the similar processes.
There are several facts either confirming the involvement of several genes that in vitro respond to IFNa or arguing in favor of their potential participation in liver restoration (Table 2).The formal resemblance bet-ween the expression of a limited number of genes in our experiment and the processes activated during the transition period after PH is not sufficient to consider IFNa as a candidate regulator at this stage of liver restoration, but it offers a challenge for a more in-depth study and gives a potential clue to elucidating its place in the context of liver restoration.
Inferring gene-regulatory network.Essentially all biological functions of a living cell are carried out through the interplay between many protein-coding and protein-noncoding genes.Identifying these gene networks and their functions is a main challenge in systems biology, which using the high-throughput biotechnologies enables the collection of large-scale genomics data on gene expressions, genome-wide location of transcription factor binding sites, protein-protein interactions, genetic variations, and many other types of data.These data provide valuable system level information on different aspects of the complex biological processes and make it possible to infer the underlying networks.
Various models have been proposed in literature to represent and simulate the behavior of Gene Regulatory Networks (GRNs).Boolean, Bayesian Networks, Differential Equations, Information Theory, supervised learning methods are some of the prominent ones.Boolean networks use a binary variable (gene is «on» or «off») to represent the state of a gene activity and a directed graph to represent genes connections.The state of a particular gene in time t + 1 can be described as Boolean function of other genes in time t.Bayesian models make use of Bayes rules and consider gene expressions as random variables.In Bayesian models each node is associated with a probability function that takes as input a particular set of values for the node's parent variables and gives the probability of the variable represented by the node.Difference and differential equations allow more detailed descriptions of network dynamics, by explicitly modeling the concentration changes of molecules over time.Information theory models correlate two genes by means of a mutual information correlation coefficient and a threshold.
Each of the models has their strength and weakness and for each of them it is possible to apply different methods which may be realized by different algorithms.Despite substantial research on this topic, it remains a great challenge to elucidate the complete network due to the complexity of the transcription processes and the noisy nature of high-throughput raw data [26][27][28].
One of the significant problems the scientist encounters is a NP-hard (Non-deterministic Polynomial-time hard) character of the GRN inference that is the time required to create GRN using any currently known algorithm increases very quickly as a size of the data grows [29].To overcome the problem the modern reconstruction algorithms use heuristics that reduce the field of search at expense of the accuracy of reconstruction and its biological relevance.
We proposed a novel integrated approach to infer GRN -an ensembl-method that combines the networks obtained by other methods, selects their best traits, and  Table 2 Characteristics of regenerating liver related to the in vitro hepatocytes response to IFNa [16] uses Grid environment that enables the inference in reasonable time with minimal euristics (if needed) due to high level of parallelization of calculations [30].The six sequential steps testifying the applicability of specific combination of the models with methods and algorithms for the tasks parallelization in GRID environment are represented in the Fig. 1.The GRN reconstruction technique was validated and evaluated by the «golden standard» (The Dialogue for Reverse Engineering Assessments and Methods (DREAM) program; http:// www.the-dream-project.org/)and synthetic networks.
The DREAM provides the expression data and corresponding GRNs and encourages researchers to develop new efficient computation methods and evaluate them by comparison with the results of the «golden standard».The synthetic GRNs of different size were self-made on the basis of DREAM-4 InSilico Challenge (http:// gnw.sourceforge.net/dreamchallenge.html#dream4chal lenge) by the GeneNetWeaver (http://gnw.sourceforge.net) tool [30].
The conventional AUROC (the Area under the Receiver operating Characterisics curve) and the AUPR (the Area Under Precision/Recall curve) evaluation measu-res have been used to compare the obtained networks with the above mentioned standards*.The best methods (Aracne, Banjo, CLR, MrNet) were chosen by these criteria and compared with the ensembl-methhod that has combined these methods using their best features.The characteristics of ensembl-method appeared substantially better than those of each specified methods but its obvious advantage is that it may be used in GRID environment.The pilot inference of GRN in rat hepatocytes, treated with different pharmacological and xenobiotic compounds (7700 genes, 5288 animal studies; gene expression Omnibus, GEO; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE8858), has taken 15 days of parallel computations on GRID while serial computations on average modern computer would take up to 580 days (unpublished).So, the ensembl-method combined with GRID technologies has definite advantage and is admissible for inferring GRNs.Our virtual organization Sysbio is registerd in Ukrainian National grid.
A booming of computational biology, which sheerly was a theoretician's fantasy twenty years ago, has become a reality.Various GRNs are successfully reconstructed from different species, tissues and cells [31][32][33].But the challenging problems in this area still remain.Just to name a few, the number of variables (genes) in a system under investigation is considerably more than the number of measurements (observations) of those variables that does not correspond to the requirements of matrix formation; the absence of reference GRN for human and other model mammalian organisms.There are several propositions to overcome these problems [34,35] but the elaboration of new methods and optimization of the existing ones are greatly encouraged.
Modeling and simulation of folate-mediated onecarbon metabolism in human placenta.Folatemediated one-carbon metabolism (FOCM) in human placenta is one of the areas of our scientific interests [37 -39].The FOCM is represented by a network of interconnected metabolic pathways necessary for the synthesis of purine nucleotides, thymidylate, the remethylation of homocysteine (Hcy) to methionine (Met), prevalent Fig. 1.The scheme of step-by-step evaluation of ensembl-method *AUROC is evaluated using the ratio between the true positive (TP) rate and the false positive (FP) rate as a performance measure.The AUPR is calculated using the ratio between precision (i.e. TP/(TP + FP)) and recall (i.e. TP/(TP + FN) with FN denominating the number of false negative data.
majority of methylation reactions and the synthesis of glutathione (Fig. 2).Thus, FOCM is involved in the fundamental molecular functions and biological processes connected with them -proliferation, transcription, translation, maintenance of cellular redox state and detoxication [40,41].According to the specificity of its biochemical reactions the FOCM network is highly sensitive to nutritional status of amino acids and several vitamins (folate and vitamins B12, B6, and B2), that are cofactors in numerous FOCM reactions, and to numerous penetrant gene variants as the majority of its enzymes are highly polymorphic [42].At the same time it has to be robust and steady as it is involved in fundamental biological processes.The regulation of this balance is an intriguing question.Numerous pathological states with unknown etiologies (cancer, cardiovascular diseases etc.) are associated with perturbations in this network, particularly during pregnancy, affecting the development of a fetus, the health of the child post partum and maternal health [43,44].
Although a considerable research has elucidated biochemical details of FOCM, most studies have focused primarily on single reactions or pathways in isolation, failing to capture the overall functioning of the system.Even large-scale international trials would not be able to experimentally follow the FOCM variability in its endless combinations.Mathematical modeling has proven to be a powerful tool for filling such gap.
The mathematical representations of cellular metabolism have many facettes, ranging from topological and stoichiometric descriptions to kinetic models of metabolic pathways.We have applied the stoichiometric one as it goes beyond merely topological arguments, takes the specific relative quantities of reactants into account, the reversibility/irreversibility of reactions, and allows putting constraints on the functional capabilities of metabolic networks.At the same time it does not require the knowledge of kinetic characteristics of enzymes, concentrations of metabolites etc. which are difficult to obtain but are necessary for kinetic modeling.The stoichiometric analysis largely relies on the steady state assumption and does not allow accounting for allosteric regulation.
We have addressed several questions -the impact of elevated concentration of Hcy, down-regulated activities of methyleneTHF reductase (MTHFR; EC 1.5.1.20), cystathionine b-synthase (CBS; EC 4.2.1.22),methionine synthase (MS; EC 2.1.1.13),Met nutritional deficiency and several combinations of these factors in the FOCM network in human placenta [45,46].Elevated concentration of Hcy is a well known marker of perturbations in the placental FOCM [37]; three enzymes are polymorphic, their mutated forms possess lower catalytic activities than the wild forms and their prevalence at Caucasians is substantial [42].MTHFR catalyses the reduction of 5,10-methyleneTHF to 5-me-thylTHF which donates methyl group to methionine cycle; CBS is responsible for irreversible Hcy elimination and MS -for the synthesis of Met; methionine deficiency is a frequent event among vegetarians, vegans and vulnerable social groups.
We have mined the literature for placental specific gene expression of FOCM enzymes, for the first time modeled placental specific FOCM and simulated the situations close to real-life ones.The two wellknown variants of stoichiometric analysis, namely elementary flux modes (EFM) and flux balance analysis (FBA) were applied [45].
One common characteristic of the mentioned approaches is that they both compute the complete set of EFMs given the network stoichiometry.EFM is a minimal operational unit at steady state in metabolic networks satisfying the thermodynamic constraint regarding the reversibility of each reaction [46].In other words, EFM is a minimal biochemical pathway that, at steady state, catalyses a set of reactions between input and output metabolites.Each metabolic network is characterized by unique set of EFMs.
We arranged all biochemical reactions in descending order according to their frequency in the set of EFMs.Serine hydroxylmethytransferase (SHMT; EC 2.1.2.1) and serine (Ser) input are the most frequent while the reactions of taurine synthesis are the least frequent ones pointing to the importance of the former as main suppliers of one carbon units in FOCM and partial independence of the latter from folate metabolism.This conclusion coincides with experimental data [40,41].
The simulation of the abovementioned situations has revealed that they affect the majority of reactions in the network in quite specific way [45,46].For example, the twofold increase in Hcy concentration is associated with the down-regulation in the fluxes supporting the synthesis of nucleic acids precursors and up-regulation of cysteine production providing the elimination of Hcy excess that correlates with the experimental data obtained on the placental explants cultivated with Hcy [39].The C677T heterozygosity of MTHFR stimulates the pronounced accumulation of Hcy, down-regulation of Hcy remethylation and increasing requirement in 5-MTHF.In this situation the folate therapy is justified while in the case of MS deficiency the rapidly increasing concentration of Hcy, down-regulated reactions of methionine cycle are associated with the inhibited flux through MTHFR.In this case the folates prescription will not improve the situation as the flux through MTHFR is inhibited.In context with these results the homozygous MS knockout mice embryos survive through implantation but die soon thereafter.Nutritional supplementation during pregnancy was unable to rescue embryos that were completely deficient in methionine synthase [47].
As can be seen from above the simulation of closeto-real-life situations in FOCM model induces on the one hand the changes similar to those experimentally obtained that underscore the validity of the model.On the other hand the results of analysis draw attention to several previously unknown interrelations in FOCM that were not experimentally explored so far and have to be taken into account.The more detailed analysis of FOCM at the patients is required for the justified therapy.
Looking ahead.Therefore, the Systems Biology has officially started in Ukraine and we have got nearly four years experience and received our first results.During the nearest time we intend to experimentally verify the predictions made by COTRASIF, gene expression profiling, FOCM simulations and also master the reconstruction and analysis of GRNs.
Our general considerations refer to the perspectives of Systems Biology development in Ukraine.Understanding the function of complex biological systems is one of the greatest challenges facing the modern science.The scientific community in the world is going along this way.Ukraine has all possibilities to move in the same direction.We have highly qualified specialists in different areas of fundamental science who might master and spread the new knowledge in scientific community.The courses on Systems Biology have to be included in the programs of high schools.The special grants have to be established for the development of Systems Biology.Systems Biology may start from the pure computational biology using a wealth of available data and software and widespread partnership with our foreign colleagues but the investments in facilities for high-throughput technologies are indispensable.In conclusion -a journey of a thousand miles begins with a single step.The first steps are already made in the Institute of Molecular Biology and Genetics of NAS of Ukraine.Let it be a start of great promise.