Help  

The PROSPER webserver can be accessed at https://prosper.erc.monash.edu.au for the online prediction of protease substrates and their corresponding cleavage sites from primary amino acid sequences only. At present, PROSPER can predict the substrate cleavage sites for 24 different proteases involving aspartic (A), cysteine (C), metallo (M) and serine (S) protease superfamilies. Different from other general tools, PROSPER uses a machine learning approach based on support vector regression (SVR) to provide the real-valued prediction of substrate cleavage probability. In particular, it uses a novel bi-profile Bayesian approach to extract the local sequence and structural profiles including the binary amino acid sequence profile, predicted secondary structure, solvent accessibility and native disorder features. This strategy has been shown to significantly improve the predictive performance of PROSPER and Cascleave, our previously established tool for predicting caspase substrate cleavage sites. The rationale behind this approach is that peptide sequences that can be cleaved by proteases should exhibit different features opposed to those that cannot be cleaved. Therefore, integrating the bi-profile Bayesian features by representing each positive/negative sample in a bi-profile manner could principally provide more informaitve features than the conventional binary amino acid sequence encoding scheme.

 

 

Usage

The web interface is fairly straightforward to use: the user only needs to input the one-letter FASTA format of the query sequence. A typical task for a query sequence with ~500 residues long will roughly take 8-12 minutes. Once the prediction task is completed, the results will be returned to the screen.

 

Step 1:

First input the query sequence in the FASTA format such as:
>Q07955
MSGGGVIRGPAGNNDCRIYVGNLPPDIRTKDIEDVFYKYGAIRDIDLKNRRGGPPFA...

After inputting the sequence, click the 'submit' button to submit the job:

 

Step 2:

During the job processing, a process bar will be shown which indicates the progress of the submitted job:

 

Step 3:

As soon as the submitted job is completed, the webpage will be redirected to the result page:

 

In the result webpage, its first part is the input sequence, with the predicted cleavage site colored according to the corresponding protease families (An example is shown in the above Figure). While the mouse passes over the colored sites, a hit bubble will appear which shows the predicted probability and relevant protease at this site.

 

 

Figure 4. The sample output from the PROSPER server for the submitted sequence Q07955 (Uniprot ID).

The second part of the result page will give a tab-style view of predicted result categorized by the type of protease. Each tab contains a sortable table with the predicted cleavage site position, P4-P4' segment, N-fragment and C-fragment size, and the predicted cleavage probability. Also, a straightforward picture presenting an overview of the entire sequence is given below, where the predicted disordered region by the DISOPRED2 program is also highlighted.

Figure 4 shows the example output of a submitted sequence (Uniprot ID: Q07955). It can be seen that this sequence is predicted to be cleaved by Cathepsin K, Caspase-1, 3, 7, 6 and 8 (A protease that is predicted to cleave the submitted substrate will be shown in the result). In addition to the cleavage site position and P4-P4' segment, PROSPER also provides the quantitative cleavage probability for each cleavage site and highlights the natively unstructured region for further investigation. Note that the value of the predicted cleavage probability itself contains a sort of confidence in the prediction. We can actually loosen the cutoff threshold to include more potential cleavage sites, however, this also inevitably results in an increasing number of false positives (i.e, non-cleavage sites). In this example, cleavage sites with predicted cleavage probability score greater than 0.8 are ranked and highlighted in the result webpage.

 

 

Computational efficiency

Although the calculation time depends on the length of the submitted sequence, a typical task for a query sequence with ~500 residues long will normally take 8-12 minutes. As soon as the submitted prediction task is completed, a webpage detailing the prediction results will be returned to the screen.

 

 

Predictive performance of PROSPER

The predictive performance of PROSPER was evaluated using the Accuracy, Sensitivity, Specificity, F-score and MCC (Matthew's Corelation Coefficient) measures. In order to objectively evaluate the predictive performance, we employed 5-fold cross-validation tests and independent test (See the following Table 1 and 2, respectively, for more details).

 

Table 1. Predictive performance of PROSPER for predicting substrate cleavage sites of 24 individual proteases using sequence encoding scheme "ALL" that combines all the relevant sequence and structural features. The results were obtained by 5-fold cross-validation.

Superfamily Protease Merops ID Accuracy (%) Sensitivity (%) Specificity (%) F-score (%) MCC
Aspartic protease HIV-1 retropepsin A02.001 85.5 75.0 89.0 72.1 0.678
Cysteine protease Cathepsin K C01.036 79.6 47.1 90.6 53.7 0.527
Cysteine protease Calpain-1 C02.001 80.2 38.3 94.2 49.2 0.496
Cysteine protease Caspase-1 C14.001 87.5 52.0 99.3 67.5 0.658
Cysteine protease Caspase-3 C14.003 94.6 82.8 98.5 88.5 0.858
Cysteine protease Caspase-7 C14.004 89.6 60.7 99.3 74.5 0.720
Cysteine protease Caspase-6 C14.005 93.7 65.5 97.7 76.0 0.729
Cysteine protease Caspase-8 C14.009 89.7 65.5 97.7 76.0 0.729
Metalloprotease Matrix metallopeptidase-2 M10.003 87.0 77.4 90.2 74.8 0.704
Metalloprotease Matrix metallopeptidase-9 M10.004 81.2 28.9 98.6 43.4 0.463
Metalloprotease Matrix metallopeptidase-3 M10.005 79.9 33.6 95.4 45.5 0.470
Metalloprotease Matrix metallopeptidase-7 M10.008 81.6 31.6 98.2 46.2 0.483
Serine protease Chymotrypsin A (cattle-type) S01.001 88.5 79.5 91.5 74.5 0.733
Serine protease Granzyme B (Homo sapiens-type) S01.010 97.1 96.4 97.3 94.3 0.926
Serine protease Elastase-2 S01.131 82.9 37.8 98.0 52.5 0.530
Serine protease Cathepsin G S01.133 81.0 71.6 84.1 65.3 0.613
Serine protease Granzyme B (rodent-type) S01.136 93.2 80.5 97.4 85.5 0.824
Serine protease Thrombin S01.217 90.2 64.9 98.6 76.7 0.738
Serine protease Plasmin S01.233 87.8 64.6 95.5 72.5 0.691
Serine protease Glutamyl peptidase I S01.269 91.4 84.5 93.7 83.1 0.793
Serine protease Furin S08.071 93.0 72.0 100 83.7 0.811
Serine protease Signal peptidase I S26.001 94.6 82.5 98.6 88.4 0.858
Serine protease Thylakoidal processing peptidase S26.008 89.5 69.8 96.1 76.9 0.738
Serine protease Signalase (animal) S26.010 85.8 50.5 97.6 64.0 0.622

 

PROSPER provides competitive predictive performance by integrating primary sequence features with the predicted solvent accessibility/secondary structures/native disorder features, which serve as a supplement to the primary sequence. By integrating these features, PROSPER is capable of distinguishing more difficult and challenging cleavage sites that cannot be readily detected by methods based only on primary sequence information.

 

 

References

Backes, C. et al. (2005) GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences. Nucleic Acids Res., 33, W208-W213.
Barkan, D.T., Hostetter, D.R., Mahrus, S., Pieper, U., Wells, J.A., Craik, C.S., and Sali, A. (2010) Prediction of protease substrates using sequence and structure features. Bioinformatics 26, 1714-1722
Boyd, S.E., Pike, R.N., Rudy, G.B., Whisstock, J.C., and Garcia de la Banda, M. (2005) PoPS: a computational tool for modeling and predicting protease specificity. J. Bioinform. Comput. Biol. 3, 551-585
Cheng, J. et al. (2005) SCRATCH: a Protein Structure and Structural Feature Predic-tion Server. Nucleic Acids Res., 33, W72-76.
Dix, M.M. et al. (2008) Global mapping of the topography and magnitude of proteo-lytic events in apoptosis. Cell, 134, 679-691.
Enoksson, M. et al. (2007) Identification of proteolytic cleavage sites by quantitative proteomics. J Proteome Res, 6, 2850-2858.
Enoksson, M. and Salvesen, G.S. (2008) Proteolytic needles in the cellular haystack. Nat Chem Biol, 4, 651-652.
Fischer, U. et al. (2003) Many cuts to ruin: a comprehensive update of caspase substrates. Cell Death Differ., 10, 76-100.
Garay-Malpartida, H.M. et al. (2005) CaSPredictor: a new computer-based tool for caspase substrate prediction. Bioinformatics, 21, i169-i176.
Gasteiger, E. et al. (2005) Protein Identification and Analysis Tools on the ExPASy Server. In The Proteomics Protocols Handbook Edited by: Walker JM. Humana Press; 571-607.
Joachims, T. (1999) Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning. Edited by: Sch?lkopf, B., Burges, C. and Smola, A., Cambridge, MA: MIT Press.
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292, 195-202.
Ju, W. et al. (2007) Proteome-wide identification of family member-specific natural substrate repertoire of caspases. Proc. Natl. Acad. Sci. USA, 104, 14294-14299.
Kleifeld, O., Doucet, A., auf dem Keller, U., Prudova, A., Schilling, O., Kainthan, R.K., Starr, A.E., Foster, L.J., Kizhakkedathu, J.N., and Overall, C.M. (2010) Isotopic labeling of terminal amines in complex samples identifies protein N-termini and protease cleavage products. Nat. Biotechnol. 28, 281-288
Lohmuller, T. et al. (2003) Toward computer-based cleavage site prediction of cyste-ine endopeptidases. Biol. Chem., 384, 899-909.
Luthi, A.U. and Martin, S.J. (2007) The CASBAH: a searchable database of caspase substrates. Cell Death Differ., 14, 641-650.
Mahrus, S. et al. (2008) Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell, 134, 866-876.
Nicholson, D.W. (1999) Caspase structure, proteolytic substrates, and function during apoptotic cell death. Cell Death Differ., 6, 1028-1042.
Pop, C. and Salvesen, G.S. (2009) Human caspases: Activation, specificity and regulation. J Biol Chem, 284, 21777-21781.
Rawlings, N.D. et al. (2008) MEROPS: the peptidase database. Nucleic Acids Res., 36, D320-D325.
Shao, J. et al. (2009) Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS ONE, 4, e4920.
Schilling, O. and Overall, C.M. (2008) Proteome-derived, database-searchable pep-tide libraries for identifying protease cleavage sites. Nature Biotechnol., 26, 685-694.
Schneider, T.D. and Stephens, R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 6097-6100.
Song, J., Tan, H., Shen, H., Mahmood, K., Boyd, S.E., Webb, G.I., Akutsu, T., and Whisstock, J.C. (2010) Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 26, 752-760
Song, J., Tan, H., Boyd, S.E., Shen, H., Mahmood, K., Webb, G.I, Akutsu, T., Whisstock, J.C. and Pike, R.N. (2011) Bioinformatic approaches for predicting substrates of proteases. J. Bioinform. Comput. Biol. 9, 149-178
Song, J., Tan, H., Perry, A.J., Akutsu, T., Webb, G.I., Whisstock, J.C. and Pike, R.N. (2012) PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS ONE, 7(11), e50300
Schechter, I., and Berger, A. (1967). On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 27, 157-162
Timmer, J.C. and Salvesen, G.S. (2007) Caspase substrates. Cell Death Differ., 14, 66-72.
Timmer, J.C.et al. (2009). Structural and kinetic determinants of protease substrates. Nat Struct Mol Biol, 16, 1101-1108.
Vapnik, V. (2000) The nature of statistical learning theory. Springer, New York.
Ward, J.J. et al. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol., 337, 635¨C645.
Wee LJ et al. (2006) SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics, 7 (Suppl 5), S14-S15.
Wee, L.J. et al. (2007) CASVM: web server for SVM-based prediction of caspase substrates cleavage sites. Bioinformatics, 23, 3241-3243.
Yang, J.Y. and Widmann, C. (2001) Antiapoptotic Signaling Generated by Caspase-Induced Cleavage of RasGAP. Mol. Cell. Biol., 21, 5346¨C5358.
Yang, Z.R. (2005) Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks. Bioinformatics, 21, 1831-1837.

 

 

Citation

Song J, Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC and Pike RN. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. Submitted for publication

 

 

Contact

Dr. Jiangning Song
NHMRC Peter Doherty Fellow
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia
Email:

 

Prof. Robert Pike
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia


Prof. James Whisstock
ARC Federation Fellow
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia


 

 


Copyright © 2012-2018. Monash Bioinformatics Platform, School of Biomedical Sciences, Faculty of Medicine, Monash University, Australia