Annotating a bacterial genome with ProkkaΒΆ

We’re going to use the Prokka software to annotate our newly assembled bacterial genome (from the E. coli 0104 reads). (You can think of it as an alternative to RAST.)

We have to download and install a lot of stuff, though – estimated ~15 -20 minutes.

First, we need to install BioPerl and NCBI BLAST+; for this we’ll use the Debian Linux package installer, ‘apt-get’:

apt-get update
apt-get -y install bioperl ncbi-blast+

Now download and unpack Prokka:

cd /mnt
curl -O http://www.vicbioinformatics.com/prokka-1.7.tar.gz
tar xzf prokka-1.7.tar.gz

Prokka depends on a lot of other software, too; so we’ll need to install all of that.

Install HMMER:

curl -O ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/hmmer-3.1b1.tar.gz
tar xzf hmmer-3.1b1.tar.gz
cd hmmer-3.1b1/
./configure --prefix=/usr && make && make install

Install Aragorn:

cd /mnt
curl -O http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/aragorn1.2.36.tgz
tar -xvzf aragorn1.2.36.tgz
cd aragorn1.2.36/
gcc -O3 -ffast-math -finline-functions -o aragorn aragorn1.2.36.c
cp aragorn /usr/local/bin

Install Prodigal:

cd /mnt
curl -O http://prodigal.googlecode.com/files/prodigal.v2_60.tar.gz
tar xzf prodigal.v2_60.tar.gz
cd prodigal.v2_60/
make
cp prodigal /usr/local/bin

Install tbl2asn:

cd /mnt
curl -O ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/tbl2asn/linux64.tbl2asn.gz
gunzip linux64.tbl2asn.gz
mv linux64.tbl2asn tbl2asn
chmod +x tbl2asn
cp tbl2asn /usr/local/bin

Install GNU Parallel:

cd /mnt
curl -O http://ftp.gnu.org/gnu/parallel/parallel-20130822.tar.bz2
tar xjvf parallel-20130822.tar.bz2
cd parallel-20130822/
ls
./configure && make && make install

Install Infernal:

cd /mnt
curl -O http://selab.janelia.org/software/infernal/infernal-1.1rc4.tar.gz
tar xzf infernal-1.1rc4.tar.gz
cd infernal-1.1rc4/
ls
./configure && make && make install

Download an E. coli assembly (this is the one produced by Velvet for k=41 in Basic (single-genome) assembly):

cd /mnt
mkdir annot
cd annot
curl -O http://athyra.idyll.org/~t/ecoli-v41.fa

And ... finally, run Prokka on the downloaded file!

../prokka-1.7/bin/prokka ecoli-v41.fa --outdir ecoli0104 --prefix ecoli0104 --force

This will produce a bunch of files in a directory named ‘ecoli0104’. The ecoli0104.faa file will contain the predicted & annotated proteins, while the ecoli0104.fna file contains the original contigs. This directory contains all of the files necessary to submit the genome to NCBI, too.

To look at the .faa, try:

head ecoli0104/ecoli0104.faa
comments powered by Disqus