Download


To construct CKSAAP_UbSite, 203 ubiqulytated substrates, which were previously compiled by Radivojac et al (2010), were downloaded from http://www.ubpred.org/sgd_predictions.txt.gz. These 203 proteins contained 272 experimentally validated ubiquitination sites, which are regarded as positive samples. By employing the similar strategy as the work of Radivojac et al (2010), we extracted 4642 negative samples from the 124 mitochondrial matrix proteins. After a filtering of sequence identity of 40%, we obtained a filtered ubiquitination site dataset containing 263 positive and 4345 negative samples (i.e. Radivojac_dataset), which was used to train and test CKSAAP_UbSite . Since the number of available non-ubiquitination sites in Radivojac_dataset is much larger than that of ubiquitination sites, we randomly selected ten sets of negative samples to allow a reliable performance estimation of CKSAAP_UbSite.

Pre-computed datasets for CKSAAP_UbSite could be downloaded here.

All the positive and negative samples (after a filtering of sequence identity of 40%)  
     
Ten sets of randomly selected negative samples  

 

Currently, the stand-alone program that implements CKSAAP_UbSite and hCKSAAP_UbSite could be downloaded from here (with authoriztion code given under request).


The dataset of hCKSAAP_UbSite was obtained from two high throughput proteomic assays (Danielsen, et al., 2011; Wagner, et al., 2011) and our own literature search. The redundant sequences was removed by using Blastclust program with 30% sequence identity cutoff. At last, there are 6118 ubiquitination sites in 2500 proteins for the training dataset and 3419 ubiquitination sites in 1352 proteins for the independent testing dataset.

Pre-computed datasets for hCKSAAP_UbSite could be downloaded here.

Pre-computed datasets for hCKSAAP_UbSite  

Help in use CKSAAP_UbSite