Identification of membrane proteins as potential drug targets in Escherichia coli ATCC 25922 using in silico approaches

Aim. The aim of this study was to identify the novel potential drug targets of E.coli ATCC 25922 through subtractive genomic analysis. Methods. The identification of non-homologous proteins to the human proteome, search for E.coli essential genes and estimation of drug target novelty were performed using BLAST. The unique metabolic pathways identification was done using the data and tools from KEGG (Kyoto Encyclopedia of Genes and Genomes). Prediction of the sub-cellular proteins localization was performed using a combination of the tools PSORTb, CELLO and ngLOC. The homology modeling was performed by web-server I-TASSER, the models being validated using MolProbity web-server. The binding sites were analyzed using Discovery Studio 2017 with web servers ProBis and PrankWeb. Results. Proteome of Escherichia coli ATCC 25922, which contains 4808 proteins, has been taken to form the initial set. Using the subtractive genome analysis we identified 9 membrane proteins which are essential, non-homologous to human proteome, involved in unique metabolic pathways and are not described as the drug targets. A study of the spatial structure of this proteins showed that 6 of them have binding sites for ligands. Conclusions. Using classical bioinformatics approaches we identified 6 molecular targets of Escherichia coli ATCC 25922, which can be exploited for further rational drug design in order to find novel therapeutic agents for the treatment of infection caused by E.coli .


Introduction
Uncontrolled using of antibiotics led to antibiotic resistance. The spread of antibiotic resistance accelerated. For example, ceftazidimeavibactam, a drug composed of ceftazidime, cephalosporin antibiotic, and avibactam, a β-lactamase inhibitor, was approved for medical use in the United States and Europe in 2015. It is used to treat certain multidrug-resistant gram-negative infections. In December 2018 a ceftazidime-avibactam-resistant Klebsiella pneumoniae strain was isolated in Finland [1]. The microorganisms become resistant to antibiotics due to mutations in drug target proteins or the use of new alternative metabolic or signaling pathways to avoid harmful effect [2]. One way to pass up the pathogen resistance is search for the new drug targets among bacterial proteins. The methods of gene knockout and RNA interference are widely used in order to find novel targets. These methods have proven to be extremely effective, as the result can be immediately seen on living cells. Along with the advantages, these methods have a number of disadvantages, including high cost and timeconsumption to set up the experiment. The bioinformatics methods are an alternative way to find novel drug targets. To date, they are successfully used in searching for new targets and are the fastest and cheapest [3].
The Escherichia coli infections are the major challenge for public health. Diarrheal illness causes high mortality worldwide, particularly in children under the age of 5 and particularly in countries of Africa and South Asia [4]. Even in the United States only one pathogenic strain of E.coli causes 73 000 illnesses annually [5].
Pathogenic E. coli can cause a broad range of human diseases that span from those of the gastrointestinal tract to the urinary tract, bloodstream, and central nervous system [6]. Over the last decade, there has been a significant increase in antibiotic resistance of Escherichia coli. In 2013 in Ile-Ife, Nigeria, over 130 isolates of E.coli were tested for the sensitivity to the most commonly used antibiotics. The results showed the resistance to the most commonly used antibiotics from 51.1 % to 94.3 % [7].
We used subtractive genomic analysis to identify novel molecular drug targets in E.coli. Escherichia coli ATCC 25922 strain was used as reference for pathogenic E.coli strains. The standard test bacteria ATCC 25922 are used in different studies of antibacterial compounds of plants [8], in food safety studies [9] and as the quality control strain in antibiotic resistance studies [10]. Unlike the pathogenic strains of E.coli, the ATCC 25922 strain has biosafety level 1, which makes in vitro testing much safer and cheaper. Also there are known MICs of many antibacterial compounds for the ATCC 25922 strain [11]. This is gram-negative bacterium like some of the most dangerous pathogens such as Pseudomonas aeruginosa, Klebsiella pneumoniae etc., and identification of potential drug targets could help to discover new ways of fighting against the infection caused by gram-negative bacteria.

Materials and Methods
The identification and characterization of the potential drug targets of Escherichia coli ATCC 2592 were performed sequentially by the following methods:

Retrieval of complete proteome
Whole proteome of E. coli ATCC 25922 was downloaded in FASTA format from the National Center for Biotechnology Information (NCBI) protein database [12]. This set was created by Los Alamos National Laboratory and contains 4808 proteins.

Paralog sequences identification
Protein set in FASTA format was submitted to CD-HIT Suite. Paralogous proteins were detected with cut-off 0.6 (all proteins similar more than by 60 % were rejected) [13].

Determination of protein sequences that non-homologous to the human proteome
At this step the non-paralogous sequences were submitted to BLASTp through NCBI server against Homo sapiens proteome with threshold expectation value E-value = 10 -3 [14]. Other parameters were set by default: BLOSUM62 matrix, non-redundant (nr) database. BLAST (Basic Local Alignment Tools) is an algorithm which compares query sequences of amino acids or nucleotides with each other or with database. As a result, the sequences which were similar to the human host were removed and only the non-homologous sequences were taken for further analysis.

Identification of essential proteins
The set of non-paralogous, non-homologous sequences was subjected to similarity against the Database of Essential Genes (DEG) [15], which contains all the essential genes currently available. Protein BLAST was performed with threshold expectation value E-value = 10 -5 . Identification of essential proteins was done by comparing query sequences of the E. coli ATCC 2592 strain with essential proteins of the E. coli MG1655 I and MG1655 II strains.

Prediction of protein localization
It is a common fact that potential drugs would bind easier if the targets are membrane proteins. To find proteins localization we used combination of programs PSORTp [16], CELLO [17] and ngLOC [18]. Essential nonhomologous proteins were subsequently submitted to these programs in order to get more accurate result. All proteins, which were identified as membrane proteins in three programs, were taken for further analysis.

Metabolic pathways analysis
The metabolic pathways analysis of essential membrane proteins was done by KAAS ser ver at KEGG (Kyoto Encyclopedia of Genes and Genomes) [19]. KAAS predicts the role of proteins involved in pathways by comparing the query with the KEGG GENES database in BLAST. The output contains the metabolic pathways where submitted proteins are involved along with KO list (KEGG Orthology).

Identification of already known drug targets
To identify the known molecular targets we used Drug Bank Database [20], a free online web-resource, which contains the information about drugs and drug targets. We used Drug Bank Database version 5.1.5 (released 2020-01-03) with the information about 13489 known and over 6348 experimental drugs. The set of essential proteins involved in metabolic pathways was submitted through BLASTp against Drug Bank Database (E-value 10 -3 ).
The known molecular targets were excluded from further analysis.

Homology modeling of proteins
We have conducted a search for the crystal structures or homology models for every no vel potential drug target that was identified in current study. For that purpose, we used UniProt to figure out the existence of reference structures for every target. The search was performed in UniProt BLAST with default settings, and the results with the best similarity score were selected. The existence of novel drug target structure was checked in "Structure" section. Homology modeling in I-TASSER [21] server was performed for the drug targets that do not have crystal structure or homology model in SWISS-MODEL repository and there is no information about protein structure in UniProt. After that, the best models were vali dated using MolProbity web-server [22].

Binding sites analysis
3D structures of novel potential targets were investigated to find binding sites for small organic molecules and confirm the potential use of the proteins for further rational drug design. For this purpose, we used web servers ProBis [23] and PrankWeb [24]. Additionally, for more accurate search Discovery Studio 2017 was used. Only the targets, the pockets of which were identified by at least two programs were selected as potential drug targets.

Results and Discussion
The main task of current study was to identify membrane proteins as potential drug targets in Escherichia coli ATCC 25922. The membrane plays a crucial role for vital processes of bac-terial cells. It provides selective permeability for cellular homeostasis and metabolic energytransduction. Membrane proteins are responsible for the active transport of nutrients and wastes, bacterial respiration, the establishment of the proton motive force in association with respiratory enzymes, ATP generation and cellcell communication in biofilms. These proteins are crucial and can be considered as potential drug targets.
Another reason for choosing membrane proteins as targets may be the growing danger of quiescent bacteria. These bacteria under stress of, for example, antibiotic therapy, can slow down their own metabolism, reduce growth rate, and modify the cell wall to avoid an immune response. Under such conditions, the bacteria can wait for a long time until the stress factor disappears, after which the infection will be a serious threat to the patient [25]. Because the metabolism of such bacteria slows down and the cell wall is modified, it is difficult for antibacterial compounds to influence the processes inside the bacterial cell. One way to solve the problem of quiescent bacteria may be to choose membrane proteins as potential drug targets.
In addition to the above, the compounds targeting membrane proteins can avoid cell efflux pumps. Efflux pumps are transport proteins, which are mostly localized in cytoplasmic membrane of bacterial cells, and play an important role in their lives, particularly in providing resistance of the microorganisms to xenobiotics. These pumps are involved in the removal of toxic substrates, including antibiotics, from cells into the environment. This feature of efflux pumps is one of the mechanisms of antibiotic resistance [26]. Therefore, focus-ing on the membrane proteins as drug targets can help to evade this mechanism.
We used subtractive genomic approach to identify potential drug targets as described previously [27][28][29]. Therefore, we have selected the proteins that meet the requirements for good antibacterial targets in particular, to be essential for bacterial cells, have no human homologue and to be conservative among bacteria species [30].
Whole proteome of E. coli ATCC 25922 which has 4808 proteins was retrieved from NCBI database. At the beginning of analysis, we removed paralogs and duplicates using server CD-HIT with cut-off = 0.6. According to the results of CD-HIT server the quantity of proteins was reduced to 4544. The resulted set of non-paralogous sequences was submitted to BLASTp against human proteome. The proteome comparison between Escherichia coli ATCC 25922 and Homo sapiens was necessary in order to eliminate the targets that are structurally similar to human proteins, and consequently to avoid the possible side effects of the antibiotic designed against drug targets. All homologues of human proteins were excluded and the number of Escherichia coli proteins was reduced to 2150. The next step of the analysis was the identification of essential proteins. The obtained set of sequences was submitted to BLAST against the Database of Essential Gene with E-value = 10 -5 . There is no information about the ATCC 25922 strain in DEG, so this step was done by comparing the query with the essential proteins of the closest E. coli strains MG1655 I and MG1655 II. The strains ATCC 25922 and MG 1655 of E. coli have identity 97.81 %. There are 609 essential genes for E. coli MG1655 I and 296 essential genes for E. coli MG1655 II in DEG. The output set included 262 essential proteins.
The next step in the process of potential drug targets identification is the prediction of protein localization. In order to identify membrane proteins the set of 262 essential nonhomologous proteins was analyzed using the combination of three programs -PSORTb, CELLO and ngLOC. The resulted set included 56 proteins, which were predicted by three programs as membrane proteins.
We analyzed the set of 56 sequences by KAAS (KEGG Automated Annotation Server), in order to identify which essential membrane proteins are involved in metabolic pathways. It was found that 20 proteins are involved in metabolic pathways.
The set of 20 essential membrane proteins involved in metabolic pathways was submitted to BLAST against Drug Bank Database to exclude the already known drug targets. The sequences which have similarities with Drug Bank Database were 1-deoxy-D-xylulose 5-phosphate reductoisomerase, ATP synthase F0 B subunit, cell division protein ZipA, penicillin-binding protein 2, aerotaxis receptor, methyl-accepting chemotaxis (MCP) signaling domain protein, methyl-accepting chemotaxis protein I, methyl-accepting chemotaxis protein ІІ. According to extensive PubMed search for one potential drug target -flagellar motor stator protein MotA, small-molecular inhibitors have been already published [31]. Additionally, we excluded cytochrome O ubiquinol oxidase subunit IV and preprotein translocase, SecE subunit because they are to small and could not have ligand binding sites. Therefore, the number of novel potential tar-gets was reduced from 20 to 9. The results of subtractive analysis for Escherichia coli are presented in the Table 1.
To evaluate the drug ability of the found targets, their 3D structures should be examined for existence of small molecules binding sites. In accordance with UniProt information, threedimensional structures are known for 7 targets. For four proteins the structures were obtained by X-ray crystallography, for two proteins -LPS export ABC transporter permease LptF and LPS export ABC transporter permease LptG the structures were obtained by electron microscopy and for one protein -PTS system mannitol-specific EIICBA component a homology model is available in SWISS-MODEL repository.
For two proteins inner membrane protein ydcZ and putative protein permease FtsX 3D the structures were not known. We built homology models of these proteins using I-TASSER server. For model constructing the server used LOMETS (Local Meta-Threading Server) to collect 10 protein templates of similar folds from PDB database. To estimate the quality of predicted models by I-TASSER we used C-score. The calculation is based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. It lies in the range from -5 to 2, where the C-score of higher value signifies a model with a high confidence. I-TASSER builds 5 homology models with different C-score. For further analysis we have chosen the homology models with best C-score. This models were validated with MolProbity score [32].
MolProbity score combines clashscore, rotamer and Ramachandran evaluations into a single score. Summary table includes a percentile relative to structures near the same resolution. Good model must have more than 66 th percentile. These models of putative protein insertion permease FtsX inner and membrane protein ydcZ have MolProbity score 2.17 and 3.2 respectively with 38 th percentile. It means that the created models could not be used for molecular manipulation like docking or molecular modeling. Unfortunately, at present there are no enough protein templates to build correct models, but as soon as suitable templates appear, the models' quality will increase sharply. The identified possible molecular targets involved in metabolic pathways, which would be analyzed further, are presented in the Table 2. Human non-homologous proteins (BLAST) 2150 4 Essential proteins analysis (DEG) 262 5 Membrane proteins identification (PSORTb, CELLO, ngLOC) 56 6 Metabolic pathways analysis (KAAS) 20 7 Binding sites analysis 9 8 Potential drug targets 6 The small molecules binding sites were identified using Discovery Studio 2017 and web servers PrankWeb and ProBis. For six targets the ligand binding sites were recognized with at least two tools that we used. For hydrogenase-1 small chain, hydrogenase-2 small chain and Ni/ Fe-hydrogenase, b-type cytochrome subunit the pockets were identified with all three tools. For PTS system, Fru family, IIB component domain protein the pockets were determined by Discovery Studio 2017 and web server PrankWeb. For LPS export ABC transporter permease LptG and LPS export ABC transporter permease LptF the pockets were identified by Discovery Studio 2017 and web server ProBis. There is no identified suitable pocket for PTS system mannitol-specific EIICBA component. The result of this step confirms potential druggability of six proteins. The results of binding site analysis are shown in the Table 3. LPS export ABC transporter permease LptG K11720 7 Ni/Fe-hydrogenase, b-type cytochrome subunit 02020 Two-component system K03620 Identification of membrane proteins as potential drug targets in Escherichia coli ATCC 25922 using in silico approaches

Conclusion
Using subtractive genome analysis, we have identified 6 membrane proteins as novel potential drug targets in Escherichia coli ATCC 25922 which are essential, non-homologous to human proteome and involved in unique metabolic pathways. Three-dimensional structural analysis shows that these targets have ligand binding sites hereby can be modulated with small-molecules and appropriately are suitable for further rational drug design. Presumably, the found drug targets will be able to speed up the future discovery of new therapeutic agents against the infections caused by E. coli and other gram-negative bacteria.