Comparative analysis of nuclear localization signal ( NLS ) prediction methods

O. M. Lisitsyna, V. B. Seplyarskiy, E. V. Sheval © 2017 O. M. Lisitsyna et al.; Published by the Institute of Molecular Biology and Genetics, NAS of Ukraine on behalf of Biopolymers and Cell. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited UDC 577.112


Introduction
The nuclear envelope separates the nucleus from the cytoplasm and provides bi-directional traffic via nuclear pore complexes [1,2].Small proteins (up to ~40 kDa) can freely permeate the nuclear envelope [3,4], whereas the traffic of the larger proteins is an active process that depends on the binding of short stretches of amino acids referred to as nuclear localization signals (NLSs) with special adaptor proteins, karyopherins [5].
The identification of novel NLSs is still a quite complicated and time-consuming task for experimental biology.Developing methods of computational biology predicting possible variants of NLSs can significantly contribute to progress in this field.Some predictor programs with different algorithms are available to identify the putative NLS (Table ).Recently, it has been demonstrated that the information about the protein localization, predicted with the bioinformatic approaches using data from protein databases, such as Protein Atlas, UniProt, LocDB and Gene Ontology, does not fully concur with the nuclear proteome data [15].Moreover, the NLS prediction can-not completely guarantee the accurate identification of novel NLSs [16], which indicates that the precision of prediction may be a major factor limiting the effectiveness and rapidity of the experimental NLS research.Here, we analyzed six state-of-the-art NLS prediction programs to detect the restrictions of NLS prediction methods and find the most effective method.
In total, 155 random transmembrane proteins from the Protein Data Bank of Transmemb rane proteins (http://pdbtm.enzim.hu/)were selected for the control dataset.Two extra datasets of transmembrane proteins (alpha type and beta type), each of the same size, were created to validate the results obtained for the first transmembrane protein dataset.

Prediction performance evaluation
To measure prediction performance, we used the following criteria: To be able to calculate a false positive rate, we considered no more than one NLS per transmembrane protein and ignored any NLS outside the experimentally predicted ones in our positive cohort of proteins.We determined the correct prediction as a result that overlapped with experimental NLS by more than three amino acid residues.
N false positive is the number of transmembrane proteins with predicted NLS, N TMP is the total number of transmembrane proteins in dataset, thus N TMP = (N true negative + N false positive ) The Matthews' Correlation Coefficient (MCC) [23] was also defined to measure the correlation between prediction and observation:

Statistical analysis
The statistical analysis was performed by R statistical computing.

An approach
We compared the prediction performance of the following six programs: PSORT II [24], NucPred [25], cNLSMapper [20], NLStradamus [26], NucImport [27] and seqNLS [28] (Table ).The number of correct predictions and the rate of false negative results were evaluated using the dataset of proteins with experimental NLSs.However, the amount of false positive predictions and true negative values were calculated based on a transmembrane protein dataset (155 proteins) suggesting that transmembrane proteins do not contain any NLSs.For equalization of true positive and false positive results, we considered the prediction of multiple NLSs within one transmembrane protein as one predicted NLS.Validation of the datasets of transmembrane proteins with two extra datasets of alpha and beta types of transmembrane proteins demonstrated the similar results for all predictors (data not shown); thus, the first dataset of 155 random transmembrane proteins could be applied as a negative control.

Search for optimal program operation modes
Algorithms of seqNLS, cNLSMapper and NLStradamus have a cut-off score option for their prediction results.Based on this function, we obtained the ROC-curve to evaluate the True Positive Rate and False Positive Rate at different prediction cut-off scores (Fig. 1).The NLStradamus has not only a cut-off score option but also the following three different prediction algorithms: simple two-state static or dynamic Hidden Markov Models (HMM) algorithms and a four-state static HMM algorithm.The ROC-curves were evaluated for each of these algorithms.For other predictors (NucPred, PSORT II and NucImport), only one value of the true positive to false positive results ratio was obtained (Fig. 1).The output of NucPred provides the colored query sequence from blue (small probability of nuclear localization) to red (high probability of nuclear localization).In the case of prediction with strict conditions (colored from orange and red), only 18 % of experimental NLSs were correctly predicted (data not shown).For this reason, the prediction performance criteria of NucPred were evaluated with less strict conditions (colored from green to red) with an increase in the numbers of correct predictions (43 %).NucImport has six training models as well as the parameter "name of species" (mouse or yeast) that can be used for predictions.We tested NucImport at each of the six models, but only with the "mouse" parameter as the "name of species" because it was more related to our dataset of human proteins.

Comparison of the predictor programs
Figure 1 shows the prediction results for the six considered computational approaches.ROC-curve comparison revealed that a lower cut-off score provided the maximum false positive results as well as the correct predictions of experimental NLS.At the points with lower cut-off scores, the number of correct predictions was approximately equal to the number of false predictions.However, the higher cutoff scores allow for a more than 4-fold correct prediction to the false positive ratio in the best cases for NLStradamus.Among six evaluated programs NucPred, NLStradamus (at cut-off scores of 0.5-1) and seqNLS service (at cut-off scores of 0.8-0.86)showed the best prediction achievements.Additionally, the evaluation of the prediction performance for each NLStra damus HMMs did not show significant differences between them at the cut-off score from 0.5 to 1 (Fig. 1).PSORT II can be compared with the NLStra da mus at cut-off score of 0.2 (Fig. 1).At the all range of cut-off scores cNLSMapper provided less true positive and more false positive predictions than NLStra damus and seqNLS.Only at the strongest cut-off score (7.0) prediction achievements of cNLS-Mapper were similar to NLStradamus (Fig. 1).In the case of NucImport, the rate of correct predictions was the same for all six models, but the minimum of the false positive results was calculated for model 6 (Fig. 1).Nevertheless, the best NucImport model 6 provided an equal ratio of correct and incorrect predictions, which was the worse prediction achievement among the estimated programs.
To evaluate the correlation between prediction and observation, the Matthews' Correlation Coefficient (MCC) [23] was calculated for each predictor at its best settings (cut-off score, prediction model).A coefficient of +1 represented a perfect prediction, 0 indicated a result no better than the random result and -1 indicated total disagreement between prediction and observation.The highest MCC (~0.3) was obtained for NucPred, seqNLS (cut-off score 0.8-0.86)and NLStradamus (cut-off score 0.5), when the best values of cNLSMapper and PSORT II were also close (0.28 and 0.2 correspondingly).According to MCC, the best prediction model of NucImport demonstrated random prediction (Fig. 2).Variation in the cut-off score of the predictors also influenced MCC; the decrease of the cut-off score led to random results (MCC is near 0) (Fig. 3).

Conclusion
In this study, we estimated the prediction performance of six NLS predictors using the following two types of datasets: human proteins with experimentally identified NLS and transmembrane proteins.The best True Positive Rate and False Positive Rate and the highest MCC were obtained for NucPred, NLStradamus (at cut-off scores of 0.5-1) and seqNLS service (at cut-off scores of 0.8-0.86).The prediction achievements of cNLS Mapper and PSORT II were a little bit worse.Our data are in agreement with Lin & Hu [28] who demonstrated that the seqNLS was a better predictor than cNLSMapper.However, our results indicated that NLStradamus showed the same or even better results than the seqNLS on our dataset of human proteins.It should be stressed that even at the highest True Positive Rate and minimum False Positive Rate, the best programs (NucPred, NLStradamus, seqNLS) correctly identified only ~45 % of the experimental NLSs.Therefore, the identification of novel NLS by predictors still requires experimental verification.

( 1 )
True positive rate = N true postitive /N expNLS (2) False positive rate = N false positive /N TMP where N true positive is the number of correct predictions in protein dataset with experimental NLS (N expNLS ), thus N expNLS = N true postitive + N false negative ,

Fig. 1 .
Fig. 1.Evaluation of the prediction performance of different NLS predictors (True Positive Rate versus False Positive Rate).Different cut-off scores are labeled for seqNLS, cNLS Mapper and NLStradamus as well as six types of training models for NucImport.