Download


This tutorial introduces the users how to use BEAN 2.0's stand-alone version in their local machines. If you wish use BEAN 2.0 to predict new type-III effectors at genome level, we strongly recommend you to download the BEAN 2.0's stand-alone version and deploy it on your local machines according to the installation guide.

BEAN 2.0

BEAN 2.0 is upgraded version of BEAN. BEAN 2.0 is a machine learning based method designed to predict type-III effectors from bacteria proteins. BEAN 2.0 is available at http://systbio.cau.edu.cn/bean/.

Requirements

Operating system

All codes of BEAN 2.0 are written in Perl, so theoretically you can install it on almost any operating system as you like. But in order to avoid some unnecessary procedures, we recommend you to use 64-bit Linux as the operating system. Because we have tested BEAN 2.0 on classic 64-bit Linux system and are sure it work well on it. In addition to that, some packages like HHsuits are easier to install on 64-bit Linux. More information about Linux system can be found in http://en.wikipedia.org/wiki/Linux.

Perl

Perl should have been installed for most of versions of Linux as default. So you can skip this section, if you can get information like below after you typing command "perl -h" under Linux's terminal.

We recommend you to use Perl v5.10.1 (*), because we used Perl v5.10.1 (*) to debug our Perl program. However, if there is no any version of Perl has been installed, you also can find it at Perl's official website http://www.perl.org/ and install it easily according to it's installation guide.

Softwares

Pfam

The Pfam database is a large collection of protein domain families. Each family is represented by multiple sequence alignments and hidden Markov models (HMMs). To get the domains of the query proteins, you should download the database from ftp://ftp.sanger.ac.uk/pub/databases/Pfam and deploy them. In addition, please download pfamscan from ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/ and hmmer from http://hmmer.janelia.org/.

HHsuite

HHblits is a new sequence searching algorithm based on hidden markov model. We use it to improve sensitivity of BEAN 2.0. It has been wrapped in HHsuite. Please download HHsuite and related database from HHblits' main page http://toolkit.tuebingen.mpg.de/hhblits/ and deploy them correctly.

BLAST

BLAST+ suit is a rewrite version of BLAST in C++ language. You can get it and corresponding NR database from NCBI website http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download.

LIBSVM

LibSVM is a machine learning package which implements support vector machine algorithm and related tools. You can download it from its website http://www.csie.ntu.edu.tw/~cjlin/libsvm/ In order to output its decision score, you need to add some codes to "svm.cpp" file before compiling LibSVM source code to executable files. Locate the "svm_predict()" function and modify its function body to this:

	double svm_predict(const svm_model *model, const svm_node *x)
	{
		int nr_class = model->nr_class;
		double *dec_values;
		if(model->param.svm_type == ONE_CLASS ||
		   model->param.svm_type == EPSILON_SVR ||
	           model->param.svm_type == NU_SVR)
			dec_values = Malloc(double, 1);
		else
			dec_values = Malloc(double, nr_class*(nr_class-1)/2);
		double pred_result = svm_predict_values(model, x, dec_values);
			
		//-----------------------Add below codes--------------------------------
		printf("%g\n", dec_values[0]*model->label[0]);
		//----------------------------------------------------------------------
		
		free(dec_values);
		return pred_result;
	}

You can also use the pre-compiled version of LibSVM wrapped with our BEAN 2.0 package (in "libsvm-2.9" subdirectory ). But you need to make sure these binary LibSVM files are executable before using them. You can use "chmod" command to grant these binary files executable permissions.

Deploy BEAN 2.0

Download

Click to download the latest version!    Latest release:2.0

Setting path

Decompress BEAN 2.0 package with below command:

	unzip BEAN_2.0.zip
	cd BEAN_2.0

You will find four subdiectories ("libsvm-2.9/", "db/", "domain/" and "model/") and one Perl files (classify.pl).

	BEAN 2.0/
	|
	|-- libsvm-2.9/			# store LibSVM binary files
	|-- db/			# BLAST database used in BEAN 2.0
	|-- model/			# SVM model
	|-- domain/         # domain database 
	|-- classify.pl			# main program of BEAN 2.0
	|-- seqs_for_test.fasta		# test files	

Put the compiled LibSVM binary files ("svm-predict", "svm-scale" and "svm-train") in "libsvm-2.9/". Then open "classify.pl" with a text editor, like vi or vim, to modify corresponding settings of BEAN 2.0 according to instruction in it.*

    #-------------------------------------------------------------------------------
    #-------------------------------------------------------------------------------
    #BLAST's database
    #example $blast_nrdb='/home/pub/blastdb/nr';
    my $blast_nrdb="/path/to/blast/nr/database/";
    #HHBLITS's database
    #example $hhsuite_db='/home/pub/database/hhsuite_database/nr20_12Aug11';
    my $hhsuite_db="/path/to/hhsuite/database/";
    #PfamScan's database
    #example $pfam_db='/home/pub/database/pfam_database';
    my $pfam_db="/path/to/pfam/database/";
    # HHblits' tool script reformat.pl path
    # Example: $reformat = '/home/you/local/hhsuite/lib/hh/scripts/'
    my $reformat= '/path/to/reformat.pl'
    # Pfam' tool script pfam_scan.pl path
    # Example: $pfamscan='/var/www/html/bean/PfamScan'
    my $pfamscan='/path/to/pfam_scan.pl'
    # Libsvm' tool script svm-predict 
    # Example: $svm_pred='/home/you/bean/libsvm-2.9/svm-predict'
    my $svm_pred ='/path/to/svm-predict';
    #-------------------------------------------------------------------------------
    #-------------------------------------------------------------------------------

Test

Type below command under Linux terminal to test BEAN 2.0 can work or not.

perl classify.pl seqs_for_test.fasta

BEAN 2.0 usage

$ perl classify.pl SEQS output_file
SEQs is a protein sequence file in FASTA format

Example

$ perl classify.pl  seqs_for_test.fasta  prediction_result.txt

You will get a result file like this if BEAN 2.0 can successful execute:

protein       score(e-value)    methods    effector
YF81_THET2       1e-25           BLAST      no
G8Z8Z9_BRAOL   -0.616122        BEAN_2.0    no