CAIJava program

Calculates codon frequencies and CAI-values of all genes

Authors: Andrei Zinovyev and Alessandra Carbone


Installation procedure:

1. Make sure that you have JDK 1.3 and biojava packages installed.

2. Copy CAIJava.java and Utils.java files into a directory.

3. You may also want to download and unzip the package with example data in the same directory (for Bacillus subtilis) to test things and see an example of command line (Windows version, bat-file)

4. Compile java files (no options for javac)

5. If you downloaded the example package, try this command:
CAIJava.bat AL009126.gbk
or
java CAIJava AL009126.gbk -f out.dat -t product -i 15 -k 3 -g -ew exwv -a genelist1.txt >out.veo
If everything is ok (calculation time is about several minutes), you will have out.dat, out.veo and codonusage files.


Command line:

java CAIJava name_of_fasta_or_genbank_file [options]


Options:

-f out_file - the name of the file where the table will be generated

-i number_of_iterations - how many iterations to make

-k type - which formula to use (0 - use CAI instead of gCAI, 3 - use gCAI, others variants are obsolete)

-a additional_info_file - name of the file containing additional information about genes (tab-delimited, first column are the gene names, see an example)

-m maximum_gene_length - maximum gene length to consider

-t feature_name - a name of a feature to extract from annotaion

-s - put this for fasta-formatted files (without this option the program works with genbank-formatted files)

-g - put this to calculate GC content of genes, and GC-content of the 3rd position of codons

-ew external_w_values - the name of file with the table of w-values calculated elsewhere (see an example, the order of codons should be as this)


Output:

1. In standard output - some information about the algorithm processing, and the final w-values table. This text can be used to visualize trajectory of the algorithm with ViDaExpert tool (as *.veo file).

2. In the file specified by -f option: table with codons, aminoacids frequencies, cai values, gene names, lengthes, comments etc. The format (see an example of output) is suitable for ViDaExpert visualization tool. First row - number of columns and number of rows respectively. Some commentaries:
GC_CONT - field with GC-content
GC_CONT3 - field with GC-content for the 3rd position of codon
CAIEXT - CAI-values calculated elsewhere
CAI - CAI-values calculated on the whole set of genes
CAICLASS - CAI-values calculated with our algorithm
CAI_IT[1-15] - gCAI-values calculated with our algorithm on every iteration
ADD1,ADD2,ADD3 - some additional information (from the file specified with -a option)

3. codonusage file with codon usages of on the every iteration of the algorithm


(C) IHES, France, 2002