A quick overview on how all parts fit together in the overall structure.
MySQL is the database software that is used to store all the information. You should have at least two ways to interact with the MySQL server: you can use the MySQL client or you can use a perl API for connecting to MySQL from perl scripts.
tofu [~] % mysql -u root -p<PASSWORD> Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 13 to server version: 3.23.33 Type 'help;' or '\h' for help. Type '\c' to clear the buffer mysql> CREATE DATABASE doodledev; Query OK, 1 row affected (0.01 sec) |
In the example above you use the MySQL client to connect to the database as the root user and create a new database named doodledev.
tofu [~] % mysql -u root -p<PASSWORD> doodledev < doodle_schema.sql tofu [~] % mysql -uroot -p<PASSWORD> -e 'grant all privileges on doodledev.* to doodle_user@localhost' |
In the example above you use the MySQL client to impot the schema and create all the tables needed for the database. The second line grants the necessary privileges for the user, in this case such user is called doodle_user.
If you are lucky you can find fasta files and table files for your genome at the NCBI. Get the fasta files for the complete genome (*.faa and *.fna) and the gemomic table (*.ptt). Make blastable databases for the fasta files and put them in the /usr/local/ncbi/seals/alias_db/blast/ directory. Modify the nucdbs.xml and the protdbs.xml files located in the /usr/local/lib/Pise/5.a/Xml/ directory. Remove the blast2.xml file in the same directory and make a new interface for blast2 using Pise.
After you examine doodle_schema.sql you will notice that there are several tables to be populated. First you should get a list of GenBank ID's (gi's) for your genome. Extract the gi's for your genome using the fasta2gi script from SEALS. Generate a file containing gi's and corresponding SwissProt ID's.
tofu [~] % gi2swissprot -i paeruginosa.gi > paeruginosa_gi2sp |
Get the chromosome table in text format for your organism from the proteome project at the EBI. Remove the comments from the text files and get a table that can be uploaded to doodle.
tofu [~] % protein_table2doodle -i ecoli_prot.ptt -g ecoli_gi2sp -e ecoli.ebi > ecoli.ptt.doodle tofu [~] % genetable2doodle -i paeruginosa.ptt.doodle -d paeruginosa |
The gbrowse script (the main script in GMOD) uses a different data structure so you need to create and populate another database for gbrowse use.
tofu [~] % mysql -u root -p<PASSWORD> Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 13 to server version: 3.23.33 Type 'help;' or '\h' for help. Type '\c' to clear the buffer mysql> create database PAO1; mysql> grant select on PAO1.* to nobody@localhost; mysql> grant file on *.* to PAO1@localhost; |
The GMOD database uses files in gff format and the following script converts *.doodle files into gff format.
tofu [~] % doodle2gff -i paeruginosa.ptt.doodle -l 6264403 -o PAO1 tofu [~] % ldas_bulk_load.pl --database ecoli --user root --password <PASS> doodle.gff |
Then you can configure gbrowse by writing a new database configuration file located in the /etc/httpd/conf/gbrowse.conf/ directory use ecoli.conf as a template .
First you need to generate independent fasta files for all genomic gi's. Run the following script in an empty directory containing only the *.gi file. Then you should create the html directories where the public blast reports live. The execution of the gi2fas and the genome2blast scripts may take a long time, be patient. You need to update the pdb database once every month and re-run the scripts below.
tofu [~] % gi2fas -d lambda.aa -i lambda.gi tofu [~] % mkdir /home/httpd/html/pub/lambda tofu [~] % mkdir /home/httpd/html/pub/lambda/blast tofu [~] % genome2blast -d . -o lambda |
Then it is necessary to parse the blast reports and upload the flat files to doodle.
tofu [~] % parse_organism -d . -o lambda > lambda_genome.br.parsed tofu [~] % parse_pdb -d . > lambda_pdb.br.parsed tofu [~] % pdb2feature -i lambda_pdb.br.parsed -d lambda tofu [~] % genome2feature -i lambda_genome.br.parsed -d lambda |
The next step is to generate coiled coil predictions for the whole genome. The script coils2pos.pl should be executed where the *.ccp files are.
tofu [~] % fasta2ccp -d . tofu [~] % fasta2coils -d . tofu [~] % coils2pos.pl -d . tofu [~] % mv output coiloutput.lambda tofu [~] % coils2doodle -i coiloutput.lambda -d lambda |
The next step is to populate the interpro table. First you need to download the protein2ipr.dat file from interpro. This file needs periodic update. Note: Do not run the swiss2ip script on the background. Then you can start the fun part which is populating the IST table.
tofu [~] % swiss2ip -i nr_lambda.sp > lambda.iphits tofu [~] % interpro2doodle -i lambda.iphits -d lambda tofu [~] % ists2genome -i ists -d lambda |