Identification and characterization of potential membrane-bound molecular drug targets of methicillin-resistant Staphylococcus aureus using in silico approaches

Aim. To identify novel putative drug targets of methicillin-resistant S. aureus (MRSA) through subtractive proteome analysis. Methods. Identification of non-homologous proteins in the human proteome, search of MRSA essential genes and evaluation of drug target novelty were performed using a protein BLAST server. Unique metabolic pathways identification was carried out using data and tools from KEGG (Kyoto Encyclopedia of Genes and Genomes). Prediction of sub-cellular proteins localization was performed using combination of PSORT v. 3.0.2, CELLO v. 2.5, iLoc-Gpos, and Pred-Lipo tools. Homology modeling was performed using SWISS-MODEL, Phyre2, I-TASSER web-servers and the MODELLER software. Results. Proteomes of six annotated methicillin-resistant strains : MRSA ATCC BAA-1680, H-EMRSA-15, LA MRSA ST398, MRSA 252, MRSA ST772, UTSW MRSA 55 were initially analyzed. The proteome analysis of the MRSA strains in several consequent steps allowed to identify two molecular targets: diadenylate cyclase and D-alanyl-lipoteichoic acid biosynthesis (DltB) protein which meet the requirements of being essential, membrane-bound, non-homologous to human proteome, involved in unique metabolic pathways and new in terms of not having approved drugs. Using the homology modeling approach, we have built three-dimensional structures of these proteins and predicted their ligand-binding sites. Conclusions. We used classical bioinformatics approaches to identify two molecular targets of MRSA :diadenylate cyclase and DltB which can be used for further rational drug design in order to find novel therapeutic agents for treatment of multidrug resistant staphylococcal infection.

In spite of large amount of proposed molecular targets, there are currently only nine investigational antibiotics against S. aureus, including MRSA, undergoing clinical trials but targeting only four proteins, such as DNA gyrase, topoisomerase IV, enoyl-acyl-carrier (ACP) reductase (FabI) and P site at the 50S ribosome subunit of bacteria [78].
Due to fast growth of resistant strains, the identification of unique drug targets amongst the resistant pathogens is very important. A number of methods are currently available to identify potential drug targets. Among them, bioinformatics approaches are the most fast and cost-effective. For example, subtractive genome analysis has been already used to identify putative molecular targets for different pathogenic strains of Staphylococcus aureus, such as Staphylococcus aureus subsp. aureus MW2 (CA-MRSA) [79], Staphylococcus aureus N315 [80], Staphylococcus aureus ST398, S. aureus 252 [81,82], vancomycin-resistant Staphylococcus aureus [83]. The aim of this

Methods
The procedure of potential drug targets identification in our work included formation of the initial set, protein sequence analysis, and structural analysis. The workflow is presented in Figure 1.

Proteome retrieval of MRSA strains
Complete proteomes of six methicillin-resistant strains of S. aureus were downloaded in FASTA format from the NCBI Protein database that contains sequences from GenBank, TPA, RefSeq, PRF, PIR, SwissProt, and PDB on April 10, 2018. Besides the sequences themselves, NCBI also provided general information about the targeted strains genome annotation data.

Comparison of the proteomes
In order to form the initial set of proteins for the analysis, [the] NCBI accession numbers (ACs) of the proteins from different strains were compared manually. Only the proteins with common ACs were included into the initial set, the others were not considered, because they were suggested to have some differences between the strains which might have an impact on the potential drug effectiveness.

Identification of non-homologous proteins to the human proteome
The representative set was subjected to Protein BLAST against human proteome with the expectation-value cutoff of 10 -3 . BLOSUM62 was chosen as the scoring matrix for the BLASTP algorithm, the non-redundant protein sequences were taken as the search set database. As a result, we have obtained homologous sequences, with significant similarity to human proteome, and non-homologous sequences, for which no hits with significant similarity were found. The proteins with homology to the human proteome were excluded from the set, thus were not taken into account during further analysis.

MRSA essential genes identification
The set of non-homologous proteins was further subjected for alignment against the Database of Essential Genes (DEG) [84,85]. The current version of this database contains essential genes of two S. aureus strains -NCTC 8325 and N315. The sequences were filtered using following settings: BLASTP as the algorithm, BLOSUM62 as the substitution matrix, expectation value cutoff equals to 10 -5 , minimal score equals to 100. The proteins which met the E-value cutoff and minimal score were considered to be essential for pathogen survival and propagation.

Unique metabolic pathways identification
Using the data and tools from KEGG (Kyoto Encyclopedia of Genes and Genomes), the metabolic pathway analysis was carried out to determine unique metabolic pathways of the pathogen. KEGG is an integrated database that contains systems, genomic, chemical and health information allowing biological interpretation of genome sequences and other throughput data [86,87]. All the essential non-homologous proteins were analyzed by KEGG Automatic Annotation Server (KAAS). It provides functional annotation of genes or proteins by BLAST or GHOST comparisons against manually curated KEGG databases [88]. In this study we used BLASTP as the algorithm for such comparison. As a result we obtained a KO assignment list, on the basis of which the manual selection of proteins involved in unique metabolic pathways was performed.

Evaluation of drug target novelty
To separate already-known drug targets from novel ones, protein BLAST against DrugBank database was performed. The latest version of DrugBank (5.1.0) contains 11 143 drug entries, including 2 555 approved small molecule drugs, 965 approved biotech drugs, 121 nutraceuticals, and over 5 145 experimental drugs. 5121 non-redundant protein sequences are linked to these entries [89]. This allowed us to exclude the proteins that have proper ligands from further analysis.

Prediction of sub-cellular localization of proteins
The important question to be addressed when choosing a target for further drug development is the localization of that particular target inside a cell. Compartment localization determines the methods for protein extraction and purification, which makes upcoming investigational steps easier or harder. In order to predict the sub-cellular localization of druggable non-homologous essential proteins a combination of tools such as PSORT v. 3.0.2 [90], CELLO v. 2.5 [91], iLoc-Gpos [92], and Pred-Lipo [93] was used. The localization was assigned to a protein only in the case when all four tools obtained the same results.

Results and Discussion
The objective of this study was to identify the novel putative drug targets of methicillin-resistant S. aureus (MRSA) through subtractive genomic analysis. The combination of subtractive genomic analysis and comparative genomics/proteomics is a powerful method for identification of unique sequences with certain metabolic functions.
The total number of proteins for each strain, which have been retrieved from NCBI Protein database, is indicated in Table 1.
Following the procedure indicated in Fig. 1, we analyzed the proteome of MRSA in several steps; the results are given in Table 2.
The comparative analysis of protein accession numbers showed that only 326 sequences are common for the abovementioned MRSA strains, the corresponding accession numbers and definitions are given in Supplementary table 1. The resulting set does not exhaust the pool of common proteins due to possible inconsistencies between the accession numbers originated from different sources but can still be considered adequate for the search of common molecular targets.
The existence of homologous proteins between bacteria and human is believed to have emerged in course of evolution [121][122][123]. A number of studies assume a "similarity hypothesis" that states homology as an evolutionary adaptation of pathogens to prevent being recognized by host's immune system [124,125]. The selection of particular homologous pairs as potential drug targets against MRSA might lead to cross-reactivity in human hosts. That is the reason why in the next step the sequences from the representative set were subjected to Protein BLAST against the whole human genome with a threshold E-value of 10 -3 . The proteins with significant similarity were excluded from further analysis to prevent crossreactions between human and pathogen during pharmaceutical treatment. This step reduced the number of sequences in the representative set to 172.
We determined the essentiality of the nonhomologous proteins of methicillin-resistant S. aureus based on bioinformatics prediction through homology search in DEG against known essential genes identified by an antisense RNA technique of two S. aureus strains, NCTC 8325 and N315. This approach has a crucial useful feature. Essential genes of an organism constitute the minimal set of genes required for a living cell in given growth con-  KAAS was used to determine whether the resulted outcome from the DEG step was involved in essential metabolic pathways. More importantly, the analysis with KAAS enabled us to exclude non-homologous essential proteins potentially having the enzymatic activity for reactions of human metabolism or a similar function regarding genetic information processing or signaling and cellular processes. The comparison was carried out between S. aureus and human metabolic networks and revealed that 28 sequences out of 45 had a significant similarity to S. aureus enzymes and at the same time had none to human. After the manual revision of KO numbers all sequences were classified according to their orthology groups, the distribution of the analyzed proteins throughout KEGG metabolic networks is represented in Figure 2.
Since the objective of our study was to identify the novel putative drug targets we have evaluated the druggability of the predicted in previous steps protein set by Protein BLAST analysis against DrugBank database (DBD) v. 5.1.0. DrugBank is a freely available web resource containing detailed drug, drugtarget, drug action and drug interaction information about FDA-approved drugs and experimental drugs going through the FDA approval process. The results, as shown in Table 3, indicate that 6 proteins out of the analyzed set interact with certain drugs.
Further, the sub-cellular localization of each protein has been predicted using PSORT v. 3.0.2, CELLO v. 2.5, iLoc-Gpos, and Pred-Lipo. The results are presented in Supplementary table 3. Together, the obtained results provide us with a set of novel putative drug targets of MRSA, including two membrane-bound proteins, namely TIGR00159 family protein (diadenylate cyclase) and WP_000613541.1 (D-alanyl-lipoteichoic acid biosynthesis protein DltB).
Diadenylate cyclase is an essential bacterial enzyme which utilizes two molecules of adenosine triphosphate (ATP) for the synthesis of the important second messenger -cyclic diadenylate monophosphate (c-di-AMP) which has been shown to regulate such processes as virulence, cell wall formation, cell size, ion transport, etc. Therefore, diadenylate cyclase is a potential target for the development of novel antibiotics. But only a small amount of low-molecular inhibitors for bacterial diadenylate cyclase has been reported in scientific literature so far. Recently, it was shown that several polyphenols inhibit Bacillus subtilis diadenylate cyclase [126]. Also, it was found that suramin, known antiparasitic drug is a potent inhibitor of diadenylate cyclase [127]. For the best of our knowledge, none small molecular inhibitor for S. aureus diadenylate cyclase has been reported.
DltB is a multi-membrane-spanning protein required for D-alanylation of teichoic acids which is important for the cell wall synthesis. Recently, Pasquina et al. [128] using the synthetic lethal approach have identified one compound that inhibits DltB S. aureus. It has been found that this inhibitor sensitizes S. aureus to several antibiotics and is lethal in combination with a wall teichoic acid inhibitor. Therefore, DltB can be considered as an important antibiotic target as well.
We have generated 3D models for diadenylate cyclase and D-alanyl-lipoteichoic acid biosynthesis protein DltB of S. aureus, which can be used for further structure-based drug design. In order to identify template proteins for homology modeling of diadenylate cyclase and DltB  Figure 3a. RMSD value of diadenylate cyclase of S. aureus with template structure is 0.754493. An ATP molecule was chosen as a ligand for modeling. The resulted superimposed structures were further analyzed to locate the binding site residues of modeled diadenylate cyclase beyond the 7 Å radius of the ligand (Figure 3b). The superposition of active sites of homology model (carbon atoms are labeled by green colour) and template structure (carbon atoms are labeled by white colour) demonstrates that the structures of investigated enzymes are very similar. RMSD value of amino acid residues in the active sites of these enzymes is 0.885616. Using BLAST we have not identified any homologous protein for DltB, therefore the 3D model should to be generated only ab initio.
We have built the model of DltB protein S. aureus using I-TASSER web-server. Confidence score (C-score) for the best model is -1.29. C-score is a confidence score for estimating the quality of predicted models by I-TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score varies typically from -5 to 2. The MolProbity score for this homology model is 3.63.
We tried to optimize I-TASSER homology model with GROMACS using different force fields but there were some problems with atom types. Therefore, we have built the homology model of DltB protein S. aureus by Swiss-Model server using as a template 3D model generated with I-TASSER server. This homology model was minimized with GROMACS using steepest descent algorithm (1000 steps). After minimization, the MolProbity score was significantly improved in comparison with the input model. MolProbity score of optimized model is 2.07.

Conclusion
In this study we used classical bioinformatics approaches to estimate whether there are potential drug targets among methicillin-resistant Staphylococcus aureus proteins. Using subtractive genomic analysis we have identified two molecular targets of MRSA -diadenylate cyclase and D-alanyl-lipoteichoic acid biosyn- thesis protein which can be used for further rational drug design in order to identify novel therapeutic agents for the treatment of multidrug resistant staphylococcal infection.