SPAR (Self-interacting Protein AnalyzeR) was a predictor for self-interacting protein from sequence information. Based on comprehensive analysis of a variety of physiochemical properties on self-interacting proteins, a new encoding scheme named RSI has been designed.
By using random forests algorithm, the performance of this encoding scheme was benchmarked with several other encoding schemes commonly used in sequence-based PPI prediction, such as DPC, CT, LD, AC, and MAC. The results showed that the performance of RSI was the best and the corresponding AUC value could reach to 83.24%. Subsequently, the integration of the RSI with other encoding schemes was conducted. After the feature selection implemented by the mRMR algorithm, we achieved an AUC of 87.88% in the human independent test set.
To further evaluate the performance in predicting other species, we tested the proposed method on an independent test set from yeast and the resulting AUC could reach to 78.71%, which demonstrated that the proposed method has a reasonable performance in cross-species prediction.