3. Doodle pipeline

A quick overview on how all parts fit together in the overall process.

This section describes how to feed the database with clone information. First, you collect the trace files (*.ab1) from the ABI3100. You can do this by transfering the chromatograms to a specific directory in the linux server. To keep the information organized I recommend you to store the traces in a directory named after the plate that you are sequencing. The suggested clone naming convention is very simple.

Clone naming convention

3.1. Information flow

Doodle pipeline

3.2. Rename clones

tofu [~] % mkdir EH10002F
tofu [~] % cd EH10002F
tofu [~/EH10002F] % rename 's/\.ab1/.Seq/' *.ab1
tofu [~/EH10002F] % rename 's/_\d\d//' *.Seq
tofu [~/EH10002F] % rename 's/_//' *.Seq
tofu [~/EH10002F] % phred -sd . *.Seq
tofu [~/EH10002F] % abi2scf -d .
tofu [~/EH10002F] % mv *.scf /usr/local/apache/htdocs/localhost/pub/ecoli/traces/.
tofu [~/EH10002F] % mkdir fasta
tofu [~/EH10002F] % mkdir blast
tofu [~/EH10002F] % mv *.seq fasta/.
tofu [~/EH10002F] % cd fasta/
tofu [~/EH10002F/fasta] % rename 's/.Seq//' *
tofu [~/EH10002F/fasta] % cp *.seq ../blast/.

In the example above you use the rename script (Larry's filename fixer) to change from EH10002_D01_07.ab1 to EH10002D01.Seq, the next step is to run phred to write fasta files from the trace files. Do the same thing with the plate done with the reverse primer.

3.3. Upload the fasta file to the database

First you will have to catenate the fasta files in the fasta directory and add a sequence version number.files.

tofu [~/fasta] % cat *.seq > <PLATE_NUMBER>.nt

Then you will have to edit the <PLATE_NUMBER>.nt file and substitute all the *.Seq for *.Seq.001, for this purpose I use emacs and do a Query Replace: .Seq with: .Seq.001. Alternatively, you can use vi and do the replacement by typing

:g /^>/s/.Seq/.Seq.001/

:wq

tofu [~/fasta] % fasta2H37Rv -f <PLATE_NUMBER>.nt

This is a wrapper around other script and you may have to modify the script for your database, I assume that if you have made it this far you should be able to figure this out. Note: In the case of P. aeruginosa is fasta2paeruginosa.

3.4. BLAST the sequences in the blast directory and parse them

After uploading the sequences to the database you should blast the fasta files, go into the blast directory and do it. In the case of the reverse plates use the rdir2blast script instead. Remember to use your particular organism in the -o flag.

tofu [~/fasta] % cd ../blast
tofu [~/blast] % dir2blast -d . -o mtuberculosis
tofu [~/blast] % parse_report -d . > <Plate_ID>.br.parsed

Congratulations this is the last step, now you will have to upload the features.

tofu [~/blast] % txt2feature -i <Plate_ID>.br.parsed  -d <database>
tofu [~/blast] % mv feature_upload.log <Plate_ID>.br.parsed_upload.log