The enzyme dataset used in our study was extracted from the Cataltic Site Atlas (CSA). Only the entries annotated directly by literature were retained and then these entries were mapped into to the SCOP database (version 1.75). The final entries used in our work should statisfied following criteria:

  • The sequence identity between any two sequences is less than 30%
  • The seuqnece length is larger than 100
  • The consecutive missing residues is less than 10
  • PDB structures belong to four SCOP classes (i.e. all alpha,all beta, alpha+beta and alpha/beta )
  • Enough homology sequences could be available to calculate residue conservation scores

The enzyme dataset can be downloaded here:
     The FASTA sequences and the experimentally verified catalytic residues of 223 enzymes.

The weight coefficient vector WME can be downloaded here:
     The weight coefficient vector WME