Help
The PROSPER webserver can be accessed at https://prosper.erc.monash.edu.au for the online prediction of protease substrates and their corresponding cleavage sites from primary amino acid sequences only. At present, PROSPER can predict the substrate cleavage sites for 24 different proteases involving aspartic (A), cysteine (C), metallo (M) and serine (S) protease superfamilies. Different from other general tools, PROSPER uses a machine learning approach based on support vector regression (SVR) to provide the real-valued prediction of substrate cleavage probability. In particular, it uses a novel bi-profile Bayesian approach to extract the local sequence and structural profiles including the binary amino acid sequence profile, predicted secondary structure, solvent accessibility and native disorder features. This strategy has been shown to significantly improve the predictive performance of PROSPER and Cascleave, our previously established tool for predicting caspase substrate cleavage sites. The rationale behind this approach is that peptide sequences that can be cleaved by proteases should exhibit different features opposed to those that cannot be cleaved. Therefore, integrating the bi-profile Bayesian features by representing each positive/negative sample in a bi-profile manner could principally provide more informaitve features than the conventional binary amino acid sequence encoding scheme.
Usage
The web interface is fairly straightforward to use: the user only needs to input the one-letter FASTA format of the query sequence. A typical task for a query sequence with ~500 residues long will roughly take 8-12 minutes. Once the prediction task is completed, the results will be returned to the screen.
Step 1:
First input the query sequence in the FASTA format such as: >Q07955 MSGGGVIRGPAGNNDCRIYVGNLPPDIRTKDIEDVFYKYGAIRDIDLKNRRGGPPFA...
After inputting the sequence, click the 'submit' button to submit the job:
Step 2:
During the job processing, a process bar will be shown which indicates the progress of the submitted job:
Step 3:
As soon as the submitted job is completed, the webpage will be redirected to the result page:
In the result webpage, its first part is the input sequence, with the predicted cleavage site colored according to the corresponding protease families (An example is shown in the above Figure). While the mouse passes over the colored sites, a hit bubble will appear which shows the predicted probability and relevant protease at this site.
Figure 4. The sample output from the PROSPER server for the submitted sequence Q07955 (Uniprot ID).
The second part of the result page will give a tab-style view of predicted result categorized by the type of protease. Each tab contains a sortable table with the predicted cleavage site position, P4-P4' segment, N-fragment and C-fragment size, and the predicted cleavage probability. Also, a straightforward picture presenting an overview of the entire sequence is given below, where the predicted disordered region by the DISOPRED2 program is also highlighted.
Figure 4 shows the example output of a submitted sequence (Uniprot ID: Q07955). It can be seen that this sequence is predicted to be cleaved by Cathepsin K, Caspase-1, 3, 7, 6 and 8 (A protease that is predicted to cleave the submitted substrate will be shown in the result). In addition to the cleavage site position and P4-P4' segment, PROSPER also provides the quantitative cleavage probability for each cleavage site and highlights the natively unstructured region for further investigation. Note that the value of the predicted cleavage probability itself contains a sort of confidence in the prediction. We can actually loosen the cutoff threshold to include more potential cleavage sites, however, this also inevitably results in an increasing number of false positives (i.e, non-cleavage sites). In this example, cleavage sites with predicted cleavage probability score greater than 0.8 are ranked and highlighted in the result webpage.
Computational efficiency
Although the calculation time depends on the length of the submitted sequence, a typical task for a query sequence with ~500 residues long will normally take 8-12 minutes. As soon as the submitted prediction task is completed, a webpage detailing the prediction results will be returned to the screen.
Predictive performance of PROSPER
The predictive performance of PROSPER was evaluated using the Accuracy, Sensitivity, Specificity, F-score and MCC (Matthew's Corelation Coefficient) measures. In order to objectively evaluate the predictive performance, we employed 5-fold cross-validation tests and independent test (See the following Table 1 and 2, respectively, for more details).
Table 1. Predictive performance of PROSPER for predicting substrate cleavage sites of 24 individual proteases using sequence encoding scheme "ALL" that combines all the relevant sequence and structural features. The results were obtained by 5-fold cross-validation.
Superfamily | Protease | Merops ID | Accuracy (%) | Sensitivity (%) | Specificity (%) | F-score (%) | MCC |
Aspartic protease | HIV-1 retropepsin | A02.001 | 85.5 | 75.0 | 89.0 | 72.1 | 0.678 |
Cysteine protease | Cathepsin K | C01.036 | 79.6 | 47.1 | 90.6 | 53.7 | 0.527 |
Cysteine protease | Calpain-1 | C02.001 | 80.2 | 38.3 | 94.2 | 49.2 | 0.496 |
Cysteine protease | Caspase-1 | C14.001 | 87.5 | 52.0 | 99.3 | 67.5 | 0.658 |
Cysteine protease | Caspase-3 | C14.003 | 94.6 | 82.8 | 98.5 | 88.5 | 0.858 |
Cysteine protease | Caspase-7 | C14.004 | 89.6 | 60.7 | 99.3 | 74.5 | 0.720 |
Cysteine protease | Caspase-6 | C14.005 | 93.7 | 65.5 | 97.7 | 76.0 | 0.729 |
Cysteine protease | Caspase-8 | C14.009 | 89.7 | 65.5 | 97.7 | 76.0 | 0.729 |
Metalloprotease | Matrix metallopeptidase-2 | M10.003 | 87.0 | 77.4 | 90.2 | 74.8 | 0.704 |
Metalloprotease | Matrix metallopeptidase-9 | M10.004 | 81.2 | 28.9 | 98.6 | 43.4 | 0.463 |
Metalloprotease | Matrix metallopeptidase-3 | M10.005 | 79.9 | 33.6 | 95.4 | 45.5 | 0.470 |
Metalloprotease | Matrix metallopeptidase-7 | M10.008 | 81.6 | 31.6 | 98.2 | 46.2 | 0.483 |
Serine protease | Chymotrypsin A (cattle-type) | S01.001 | 88.5 | 79.5 | 91.5 | 74.5 | 0.733 |
Serine protease | Granzyme B (Homo sapiens-type) | S01.010 | 97.1 | 96.4 | 97.3 | 94.3 | 0.926 |
Serine protease | Elastase-2 | S01.131 | 82.9 | 37.8 | 98.0 | 52.5 | 0.530 |
Serine protease | Cathepsin G | S01.133 | 81.0 | 71.6 | 84.1 | 65.3 | 0.613 |
Serine protease | Granzyme B (rodent-type) | S01.136 | 93.2 | 80.5 | 97.4 | 85.5 | 0.824 |
Serine protease | Thrombin | S01.217 | 90.2 | 64.9 | 98.6 | 76.7 | 0.738 |
Serine protease | Plasmin | S01.233 | 87.8 | 64.6 | 95.5 | 72.5 | 0.691 |
Serine protease | Glutamyl peptidase I | S01.269 | 91.4 | 84.5 | 93.7 | 83.1 | 0.793 |
Serine protease | Furin | S08.071 | 93.0 | 72.0 | 100 | 83.7 | 0.811 |
Serine protease | Signal peptidase I | S26.001 | 94.6 | 82.5 | 98.6 | 88.4 | 0.858 |
Serine protease | Thylakoidal processing peptidase | S26.008 | 89.5 | 69.8 | 96.1 | 76.9 | 0.738 |
Serine protease | Signalase (animal) | S26.010 | 85.8 | 50.5 | 97.6 | 64.0 | 0.622 |
PROSPER provides competitive predictive performance by integrating primary sequence features with the predicted solvent accessibility/secondary structures/native disorder features, which serve as a supplement to the primary sequence. By integrating these features, PROSPER is capable of distinguishing more difficult and challenging cleavage sites that cannot be readily detected by methods based only on primary sequence information.
References
Backes, C. et al. (2005) GraBCas:
a bioinformatics tool for score-based prediction of Caspase- and Granzyme
B-cleavage sites in protein sequences. Nucleic Acids Res., 33, W208-W213.
Barkan, D.T., Hostetter, D.R., Mahrus, S., Pieper, U., Wells, J.A., Craik,
C.S., and Sali, A. (2010) Prediction of protease substrates using sequence
and structure features. Bioinformatics 26, 1714-1722
Boyd, S.E., Pike, R.N., Rudy, G.B., Whisstock, J.C., and Garcia de la Banda,
M. (2005) PoPS: a computational tool for modeling and predicting protease
specificity. J. Bioinform. Comput. Biol. 3, 551-585
Cheng, J. et al. (2005) SCRATCH: a Protein Structure and Structural Feature
Predic-tion Server. Nucleic Acids Res., 33, W72-76.
Dix, M.M. et al. (2008) Global mapping of the topography and magnitude of
proteo-lytic events in apoptosis. Cell, 134, 679-691.
Enoksson, M. et al. (2007) Identification of proteolytic cleavage sites by
quantitative proteomics. J Proteome Res, 6, 2850-2858.
Enoksson, M. and Salvesen, G.S. (2008) Proteolytic needles in the cellular
haystack. Nat Chem Biol, 4, 651-652.
Fischer, U. et al. (2003) Many cuts to ruin: a comprehensive update of caspase
substrates. Cell Death Differ., 10, 76-100.
Garay-Malpartida, H.M. et al. (2005) CaSPredictor: a new computer-based tool
for caspase substrate prediction. Bioinformatics, 21, i169-i176.
Gasteiger, E. et al. (2005) Protein Identification and Analysis Tools on the
ExPASy Server. In The Proteomics Protocols Handbook Edited by: Walker JM.
Humana Press; 571-607.
Joachims, T. (1999) Making large-Scale SVM Learning Practical. In Advances
in Kernel Methods - Support Vector Learning. Edited by: Sch?lkopf, B., Burges,
C. and Smola, A., Cambridge, MA: MIT Press.
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific
scoring matrices. J. Mol. Biol., 292, 195-202.
Ju, W. et al. (2007) Proteome-wide identification of family member-specific
natural substrate repertoire of caspases. Proc. Natl. Acad. Sci. USA, 104,
14294-14299.
Kleifeld, O., Doucet, A., auf dem Keller, U., Prudova, A., Schilling, O.,
Kainthan, R.K., Starr, A.E., Foster, L.J., Kizhakkedathu, J.N., and Overall,
C.M. (2010) Isotopic labeling of terminal amines in complex samples identifies
protein N-termini and protease cleavage products. Nat. Biotechnol. 28, 281-288
Lohmuller, T. et al. (2003) Toward computer-based cleavage site prediction
of cyste-ine endopeptidases. Biol. Chem., 384, 899-909.
Luthi, A.U. and Martin, S.J. (2007) The CASBAH: a searchable database of caspase
substrates. Cell Death Differ., 14, 641-650.
Mahrus, S. et al. (2008) Global sequencing of proteolytic cleavage sites in
apoptosis by specific labeling of protein N termini. Cell, 134, 866-876.
Nicholson, D.W. (1999) Caspase structure, proteolytic substrates, and function
during apoptotic cell death. Cell Death Differ., 6, 1028-1042.
Pop, C. and Salvesen, G.S. (2009) Human caspases: Activation, specificity
and regulation. J Biol Chem, 284, 21777-21781.
Rawlings, N.D. et al. (2008) MEROPS: the peptidase database. Nucleic Acids
Res., 36, D320-D325.
Shao, J. et al. (2009) Computational identification of protein methylation
sites through bi-profile Bayes feature extraction. PLoS ONE, 4, e4920.
Schilling, O. and Overall, C.M. (2008) Proteome-derived, database-searchable
pep-tide libraries for identifying protease cleavage sites. Nature Biotechnol.,
26, 685-694.
Schneider, T.D. and Stephens, R.M. (1990) Sequence logos: a new way to display
consensus sequences. Nucleic Acids Res., 18, 6097-6100.
Song, J., Tan, H., Shen, H., Mahmood, K., Boyd, S.E., Webb, G.I., Akutsu,
T., and Whisstock, J.C. (2010) Cascleave: towards more accurate prediction
of caspase substrate cleavage sites. Bioinformatics 26, 752-760
Song, J., Tan, H., Boyd, S.E., Shen, H., Mahmood, K., Webb, G.I, Akutsu, T.,
Whisstock, J.C. and Pike, R.N. (2011) Bioinformatic approaches for predicting
substrates of proteases. J. Bioinform. Comput. Biol. 9, 149-178
Song, J., Tan, H., Perry, A.J., Akutsu, T., Webb, G.I., Whisstock, J.C. and Pike, R.N. (2012) PROSPER: an integrated
feature-based tool for predicting protease substrate cleavage sites. PLoS ONE, 7(11), e50300
Schechter, I., and Berger, A. (1967). On the size of the active site in proteases.
I. Papain. Biochem. Biophys. Res. Commun. 27, 157-162
Timmer, J.C. and Salvesen, G.S. (2007) Caspase substrates. Cell Death Differ.,
14, 66-72.
Timmer, J.C.et al. (2009). Structural and kinetic determinants of protease
substrates. Nat Struct Mol Biol, 16, 1101-1108.
Vapnik, V. (2000) The nature of statistical learning theory. Springer, New
York.
Ward, J.J. et al. (2004) Prediction and functional analysis of native disorder
in proteins from the three kingdoms of life. J. Mol. Biol., 337, 635¨C645.
Wee LJ et al. (2006) SVM-based prediction of caspase substrate cleavage sites.
BMC Bioinformatics, 7 (Suppl 5), S14-S15.
Wee, L.J. et al. (2007) CASVM: web server for SVM-based prediction of caspase
substrates cleavage sites. Bioinformatics, 23, 3241-3243.
Yang, J.Y. and Widmann, C. (2001) Antiapoptotic Signaling Generated by Caspase-Induced
Cleavage of RasGAP. Mol. Cell. Biol., 21, 5346¨C5358.
Yang, Z.R. (2005) Prediction of caspase cleavage sites using Bayesian bio-basis
function neural networks. Bioinformatics, 21, 1831-1837.
Citation
Song J, Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC and Pike RN. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. Submitted for publication
Contact
Dr. Jiangning Song
NHMRC Peter Doherty Fellow
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia
Email:
Prof. Robert Pike
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia
Prof. James Whisstock
ARC Federation Fellow
Department of Biochemistry and Molecular Biology
Faculty of Medicine
Monash University
Clayton, Melbourne, VIC 3800, Australia