Human genome mutation and rearrangement studies – the way to investigate monogenic and complex disease pathogenesis

The summarized results of 25-year studies of department of human genomics of IMBG NASU are presented. The investigations were focused on identification of molecular genetic nature of human genome coding and non-coding region mutations (genetic polymorphisms) and rearrangements, their spectrum, and origin in Ukrainian population. The role of genome heterogeneity in some severe monogenic and complex disorder pathogenesis has been shown. The data concerning correlation between certain determinator gene mutations and phenotypical manifestation of most common in Ukraine monogenic diseases have been demonstrated. Moreover, the role of modifying genes in specific clinical phenotype variations has been shown. Origin of particular mutant alleles and main mechanisms of their frequency maintenance in Ukrainian population have been investigated. The data about association of some polymorphic variants with infertility, cardiovascular diseases (ischemic stroke) as well as mass infectious diseases (hepatitis C, AIDS) outcome and standard therapy efficiency have been presented. The first results and prospects for candidate genes of neurodegenerative disorders and intellectual disability search using whole genome CNVs screening are shown.

Inroduction.Despite the official end of «Human Genome» program, the studies initiated in this project are enduring.Structural and functional genome organization and DNA sequence polymorphism are still of great interest for research.According to the project results, open reading frame DNA regions cover only 1.5 % of genome.It is estimated that human genome comprises 20-25 thousand genes.Information about 14200 mapped human genes is provided by Online Mendelian Inheritance in Man ® -OMIM ® database.Nearly 98 % of human genome nucleotide sequences which are not expressed (regulatory elements, satellite DNA) fall into non-coding DNA.ENCODE (Encyclopedia of DNA Elements -started in 2003) is one of the projects focused on the separate gene function elucidation.In the framework of this project researchers attempt to classify and identify functional elements in human genome.
The data concerning variability of certain genome regions have become an important result with theoretical and practical implications.Despite the high level of genome DNA conservation and robust mechanisms of its stability maintenance, some mutations appeared to have been spreading in the population leading to genetic polymorphism (allelic variation).
Theoretically, genome variability creates the basis of population genetics for the analysis of selection and mig-ration -the most important factors influencing mutagenesis and mutation expansion.
According to OMIM ® database, which is being constantly updated, molecular genetic nature has been determined (determinator genes have been identified) for 3,739 hereditary diseases.The most large-scale projects devoted to human genome polymorphisms study are «ÍàðÌàð» (2002)(2003)(2004)(2005)(2006) and «1000 genomes» (2007).The data obtained in the frame of these projects have become a basis for new monogenic hereditary disorders and hereditary sensitivity for complex disorders candidate gene search.All these studies have become the subject of modern science -genomics.Basically, human genomics apart from fundamental knowledge have practical implications for biomedicine in terms of working out a new strategy for diagnostics, prevention, and treatment.
Studies focused on identification of molecular genetic nature of human genome coding and non-coding region mutations (genetic polymorphisms) and rearrangements, their spectrum, and origin in Ukrainian population have been conducted in department of human genomics of IMBG of NAS of Ukraine for 25 years.
Origin and gene geography of mutations associated with monogenic disorders.It has been shown that mutations can occur as a consequence of one genetic event with the following expansion in population (single origin), the same mutation can also occur repeatedly in different populations (recurrent origin).So, specific spectrum and frequency of mutations are characteristic of every gene including ones associated with pathologies.
We have conducted studies on the mutation origin and gene geography for genes CFTR, PAH, ²Ò15, TGFBI causing the most common monogenic disorders: cystic fibrosis (CF), phenylketonuria (PKU), Huntington's disease (HD), stromal corneal dystrophy (SCD) respectively, in order to obtain profound and integrated knowledge about regularities of mutagenesis and mutation maintenance in Ukrainian population.
Deletion of 3 nucleotides in 10 th exon -delF508, has been determined by many studies as the major mutation of CFTR gene.This mutation frequency in CF patients from Ukraine amounted to 47.8 % (Fig. 1).The statistically significant linkage disequilibrium delF508 mutation and certain alleles of polymorphic markers was shown in Ukrainian population as well as in many world populations.It supports the hypothesis about single origin of delF08 mutation [1,2] in European populations.
Two distinct approaches were used in order to simulate an approximate time interval of this mutation occurrence within the territory of Ukraine, which amounted from 2440 to 4060 years [2].The first one is based on CFTR gene cross over minihaplotype frequencies and the second one is based on the data about delF08 frequency in modern Ukrainian population and the hypothesis about this mutation selective advantage.
In the frame of collaborative study the so-called «Slavic» mutation CFTRdele2,3 (2,1 kb) in 5'-region of CFTR gene has been identified.It measures 21 kb, involves introns 1-3, exons 2 and 3, and is spread only in Slavic origin populations, with 1.2 % frequency in Ukrainian population [3].
Molecular genetic analysis of ÐÀÍ (phenylalanine hydroxylase) gene has established that R408W is a major mutation among PKU patients in Ukraine, and its frequency is 57 % [4].The results of segregation analysis between this mutation and ÐÀÍ gene intragenic polymorphic minihaplotypes have shown two European centres of R408W origin (Fig. 2).According to the results of the association between R408W mutation and minihaplotype VNTR03/STR238 analysis in PKU patients from Ukraine, the single Balto-Slavic origin of this mutation in our population has been identified [5].
The studies on frequency, paternal origin of inheritance, dynamic mutation in CAG-repeat of ²Ò15 gene, which is the main cause of HD, have been conducted.Fig. 1.CFTR gene mutation delF508 frequency in patients and healthy individuals from different regions of Ukraine -delF508 carrier frequencies in regional populations of Ukraine ²Ò15 gene minihaplotype segregation analysis has proven the recurrent origin of CAG-repeat expanded allelic variants (alleles with more than 40 repeats) in HD group from Ukraine [6].
The molecular study on hereditary stromal corneal dystrophies as an important problem of ophtalmogenomics has become an essential focus area of our investigation.The data concerning spectrum and frequency of causing gene TGFBI mutation in Ukrainian population were determined [7].A novel mutation in exon 12 of TGFBI gene (Leu558Pro) has been identified as well as a new atypical clinical form of corneal dystrophy in patients with this mutation has been described (Fig. 3).On the basis of revealed similar TGFBI gene minihaplotypes in patients with Leu558Pro mutation, the origin of this mutation from one ancestrial founder was proven [8,9].
It is established that the presence of genetic polymorphism of different loci in human genome is the result of mutational process.In order to evaluate the frequencies of de novo inherited mutations in some microand minisatellite loci of nontranscribed genome regions the allelic variants in nuclear families members (father, mother, children) have been analysed.The level of inherited mutations for 9 studied autosomal microsatellite loci was evaluated as 3.1× 10 -4 and 3.6 × 10 -3 for maternal and paternal origin correspondingly [10].The average level for 9 Y-chromosome STRs was 1.5 × 10 -3 (The Y Chromosome Haplotype Reference Database -YHRD).The frequency of inherited mutations analysed in 7 minisatellite loci was evaluated in ranks from 1.4 × 10 -1 to 67 × 10 -3 .It is interesting to note the unusually high mutability rate (14 %) revealed for minisatellite locus ÑÅÂ1.It has been established that the majority of inherited mutations in the studied minisatellite loci are gains and de novo originated in male germ cells [11].
Comparative analysis of CEB loci inherited mutations in the children of Chernobyl accident liquidators and from control group has shown that mutation level in the studied group was 1.5 times higher.However such differences were observed only in the group of children conceived during 30 days after their parents' exposure [11].It was supposed that the excess of inherited mutations in children conceived during or immediately after the end of their parents' work at Chernobyl is a consequence of mutagenic effect of ionizing radiation on germ cells during meiosis and sperm maturation and not on stem germ cells [11].
It was also determined that high frequency (nearly 100 %) of dynamic mutation in CAG-repeat region of ²Ò15 gene inherited from father can be explained by structural changes (mutations) in region flanking CAGrepeats.On the other hand, it is determined that ²Ò15 gene mutant alleles are more stable in female gametogenesis and in case of dynamic mutation originating during oogenesis a decrease of expanded repeats number is observed more often.The obtained data explain genomic imprinting and anticipation phenomena (more severe manifestation in descendant when inheriting the mutation from father) appearing in HD (Fig. 4) [6].
Genotype-phenotype association study in patients with monogenic hereditary pathologies.Genotypephenotype correlation analysis is the characteristics of association between specific mutant variants in patient's genome and clinical manifestations of pathology.This analysis is essential both in terms of studying disease development mechanism in general and its different clinical characteristics in particular, moreover it is very important to search for the best healthcare strategy for patients and their families.We have studied variation of main clinical characteristics in patients with different mutant genotypes and association of various mutation Fig. 2. Two centers of PAH gene R408W mutation origin according to the R408W-2.3 minihaplotype frequencies in Europe, -R408W-2.3haplotype frequency in PKU-patients with main parameters of clinical phenotype -age of manifestation, progression, severity of pathological changes at the levels of organs and systems.It is shown that in some cases (in CF, PKU, fragile X-chromosome syndrome, and spinal muscular atrophy) it is possible to establish precise correlation between certain mutant variants and specific clinical subtypes of the disease.
Majority of CF patients with delF508 in genotype have the highest levels of chlorides in sweat and the most severe form of disease (lung-gastric form with pancreal failure) [13].
Wide enough variation in phenotypical features of fragile X syndrome (Martin-Bell syndrome) including mental function peculiarities in both patients with expanded CGG-region of FMR1 gene (full mutation) and women-carriers of pre-mutation can be associated with somatic mosaicism (the presence of alleles with different number of CGG-repeats) which we have observed practically in all these individuals [14,15].It is supposed that the presence in different tissues and primarily in brain neurons of alleles with various changes of gene sequence can stipulate large stage-specific and tissuespecific variation of the gene expression [13].Negative correlation between age of HD manifestation and CAGrepeat number in expanded allele of ²Ò15 gene, and the presence of 3-nucleotied deletion in 58 th exon of the gene are established [6].
Genotype analysis in patients with different clinical forms of PKU showed that «classic» severe form of the disease is characteristic of homozygotes and compound carriers of ÐÀÍ gene R408W mutation [4].
Clear association of clinical phenotype manifestation in patients with SCD and different mutations of TGFBI gene was revealed in study of large cohort of patients from Ukraine.Vast majority of patients from 18 analysed families with Arg124Cys mutation had clinical features of lattice corneal dystrophy type 1. Association analysis between Arg555Trp mutation and clinical characteristics of SCD showed that all carriers of this mutation have symptoms of Groenouw SCD.Lattice corneal dystrophy type IIIA phenotype is associated with His626Arg mutation.Carriers of novel Leu558Pro mutation had SCD clinical features significantly different from previously described in literature (manifestation, course, morphological peculiarities).It afforded us the ground to distinguish it in separate nosological form of corneal dystrophy [7].
Investigation in group of patients with hereditary polyneuropathy allowed us to establish that heterozygous deletion of 17ð11.2region ÐÌÐ22 gene is associated with recurrent neuropathy with pressure palsies, whereas the duplication of this gene is identified in patients with CMT1A1 [12].On the other hand, it was shown that for the majority of monogenic pathologies the phenotypical variety is observed, which cannot be explained with just determinator gene mutant allele genotype of patient.It allows us to suppose the existence of other factors, including phenotype modifying genes.Modifying genes are considered to be mutant variants of genes, different from main mutant gene, which determines pathogenesis.We have proven the hypothesis that hemochromatosis gene (HFE) modifies clinical phenotype of CF.It is established that among the patients with the same CFTR gene mutation genotype, HFE gene mutations carrying is associated with gastrointestinal system pathologies [16].Study on the genotype-phenotype association in patients with SMA revealed correlation between severity and the size of deletion in 5q13 region.In the worst case the deletion involves not only determinator gene SMN1 but also NAIP gene, which is a phenotype modifier for this pathology [13,17].
The interesting data prove the association between HD early manifestation and carrying substitution Ñ677Ò in MTHFR gene, involved in homocysteine metabolism [6].We have also established that -174C allele of IL6 gene may be considered as a genetic marker of recurrent erosion development risk in patients with lattice SCD, whereas -781ÒT genotype of IL8 gene is associated with the absence of recurrent erosion in such patients [18].
For monogenic pathology, numerous evidences of clear association between mutant genotype of determinator gene and clinical phenotype have been obtained.On the other hand, large variation of specific clinical features is associated with modifying genes.
Study on genome polymorphism association with risk of complex diseases.Studies, showing that variations of phenotypic manifestations in patients with identical mutations in the monogenic disease determining genes can be caused by the presence of common in many populations polymorphic variants of other genes, have led to the wide genome association studies of allelic polymorphisms in many genes and the risk of not only monogenic but also complex pathologies.
Over the last decade, investigations of complex diseases' genetic factors have been developing rapidly.Molecular genetic analysis of polymorphic variants in genes, that individually play a minor role in the patho-genesis of complex diseases, but in combination with other endogenous and exogenous factors greatly increase the risk, plays an important role in molecular aetiology studies: cardiovascular disorders (including ischemic stroke), bronchial asthma, etc.
In our department the allelic variants analysis in genes involved in the pathogenesis of diminished ovarian reserve (FMR1, INHa, FSHR, ESR, GSTP1) is conducted.According to the calculation of statistically significant factors of relevant risk OR (Odds Ratio), in our study it has been proved that polymorphic variants of FSHR gene (Ala307-Ser680), INHa (769G ® A) ESR1 gene-397T allelic variant, GSTP1 (313 A® G) and FMR1 («grey zone» alleles containing 40-47 CGG repeats) can be used as markers for the genetic testing in order to predict the high risk of premature ovarian failure and poor response to intense exogenous gonadotropin stimulation of superovulation in assisted reproductive technology programs (Fig. 5) [19][20][21].
CAG repeats in the exon 1 of androgenic receptor gene (AR) in many cases are the cause of oligospermia and azoospermia.Moreover, our results show that short alleles of AR gene (less than 18 CAG repeats) are associated with azoospermia development, while the long alleles (more than 28 CAG repeats) are associated with oligospermia [22].
Our department activity is also focused on the analysis of genome polymorphic variants of affected individuals with cardio-vascular pathologies.It was established that mutant variant 677Ò of MTHFR gene, D allele (deletion) of ACE gene and polymorphic variant 20210A of F2 gene are the genetic markers of the ischemic stroke increased risk [23,24].
Though a lot of promising data concerning the involvement of different allelic variants of genes in development of complex diseases have been published recently, the role of genetic factors in pathogenesis of these disorders is not fully understood.Large-scale whole genome studies (e. g.HapMap project) using the advanced technologies of genome analysis are considered to improve the situation.
Determination of genetic factors causing the functional variability of homeostasis maintenance system at the genomic level opens up the prospects for prediction of individual features of the disease progression, treatment effectiveness and development of side effects.These studies, known as pharmacogenomics, are the most rapidly growing in the modern world [25][26][27].Therefore, informative pharmacogenetic markers testing before the start of treatment and its monitoring opens up real prospects for personalized medicine.
In our department the studies on pharmacogenetic markers effectiveness and development of various side effects in the treatment of severe and socially important disease -hepatitis C are being conducted.Recently, much attention has been paid to the study of IL28B gene polymorphism, which is a predictor of treatment efficacy upon using the standard procedure (IFNa and ribavirin).Our results and the results of other researchers show that rs12979860 is a highly informative polymorphic variant marker for prognosis of treatment in 75-80 % cases of patients with chronical hepatitis C, namely, the individuals carrying hepatitis C virus 1 [28].It was also shown, that mutations in HFE1 gene, which cause hereditary hemochromatosis, are associated with the degree of liver fibrosis in patients with hepatitis C [29].
It was proven that individuals with certain allelic variants of ITPA gene (94C > A and IVS2 + 21 A > C polymorphic variants) have significantly lower risk of anaemia or thrombocytopenia during antiviral therapy [30].
Considering the abovementioned data, the conclusion can be made that the development of highly informative panel of markers for genetic testing of patients with hepatitis C will allow the selection of effective individual treatment schemes based on the principles of pharmacogenomics.
Also interesting from the viewpoint of perspective of pharmacogenomics are the results of our studies on the prevalence of CCR5 gene 32 nucleotide deletion in Ukrainian population.The heterozygous carriers of these mutations are less sensitive to HIVI virus and the homozygous carriers are completely resistant to HIVI.In case of infection of CCR5 gene mutation carriers with the virus, the latent period lasts much longer and the disease runs in a mild form.The frequency of this mutation in the population of Ukraine is 9.9 % [31,32].Thus, the analysis of CCR5 gene mutation in HIV-infected individuals may also become an important tool in the treatment of AIDS.
Copy number variation (CNV's) studies as a strategy of search for neurodegenerative disease candidate genes.The studies of the human genome variability revealed a new type of mutations/polymorphisms in the genome -CNV [33].Most of these genomic variations are neutral polymorphisms, but there are dosage Fig. 5. Polymorphic allelic variants of genes involved in primary ovarian insufficiency sensitive genes, in which genomic CNV caused by deletions and duplications of individual genomic loci leads to the development of various pathologies.Due to the development of advanced methods of whole genome screening during the last years a large number of pathological conditions associated with chromosomal reorganizations, that lead to changes of genes copy number, have been identified [33,34].
There is a hypothesis according to which the vast majority of pathogenic CNVs contain genes involved in the formation and functioning of the nervous system [35].This suggestion could be explained by the extraordinary sensitivity of nervous tissues to various endogenous and exogenous factors.The use of advanced methods of genome analysis allows the mapping of genomic loci and identifying the dosage sensitive candidate genes of various neurodegenerative and neuropsychiatric diseases.
Using qPCR method, the efficient copy number analysis assays were developed for the determination of gene copy number for genomic loci in which pathogenic CNV's are associated with the development of neurodegenerative diseases: analysis of hereditary polyneuropathies associated with copy number changes of ÐÌÐ22 gene in 17ð11.2chromosomal region; copy number analysis of highly homologous genes SMN1 (SMA determining gene) and SMN2 (supposed SMA phenotype modifying gene) [12,36].The investigation of the prevalence and origin of CNVs in 17p11.2(including PMP22 gene) and 5q13.1 (including SMN1 and SMN2 genes) loci in patients with CMT1 and SMA using qPCR-based copy number analysis was carried out and allowed us to define their role in the development of pathologies.
In the frame of international FP7 CHERISH project the identifying of candidate genes that cause intellectual disability (ID) was carried out.Using MLPA (multiplex ligation-dependent probe amplification) method, the analysis of genomic microdeletions/ microduplications and subtelomeric deletions/duplications associated with ID was performed.During the project a strategy for every unique CNV confirmation, identified by array CGH screening, and their origin determination using qPCR was developed.The results of the CHERISH project in Ukraine revealed at least 18 unique novel probably pathogenic CNV's (deletions and/or duplications), that in-clude potential candidate genes associated with pathogenic phenotypes in patients [37].It should be noted, that the identification of small-sized CNVs, containing several possible candidate genes, is a rare event, but these findings make it easy to identify the candidate genes.Most of the identified rearrangements contain from tens to hundreds of genes, which greatly complicates the search.Further study of these reorganizations, especially comparative analysis of genotype-phenotype associations in patients with different-sized CNVs in the same chromosomal loci will detect the common areas and identify new genes involved in the pathogenesis of ID.By this time the processing of the results continues.
Conclusions.Summarizing the results of own research in the past 25 years and assessing the current state of human genomics it is worth to note the progress in studies of mutation associated with monogenic disorders origin, their correlation with phenotype and role of modifying genes in heterogeneity of clinical features for monogenic disorders.Important achievements were reached in the understanding of genome polymorphism involvement in susceptibility to complex disorders.It is necessary to join the efforts of many scientists in largescale whole genome multidisciplinary studies and to develop a system human biology concept in order to achieve complete understanding of the genome.This will allow identifying genes causing rare pathologies and susceptibility to common complex diseases.