Annotating a bacterial genome with ProkkaΒΆ
We’re going to use the Prokka software to annotate our newly assembled bacterial genome (from the E. coli 0104 reads). (You can think of it as an alternative to RAST.)
We have to download and install a lot of stuff, though – estimated ~15 -20 minutes.
First, we need to install BioPerl and NCBI BLAST+; for this we’ll use the Debian Linux package installer, ‘apt-get’:
apt-get update
apt-get -y install bioperl ncbi-blast+
Now download and unpack Prokka:
cd /mnt
curl -O http://www.vicbioinformatics.com/prokka-1.7.tar.gz
tar xzf prokka-1.7.tar.gz
Prokka depends on a lot of other software, too; so we’ll need to install all of that.
Install HMMER:
curl -O ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/hmmer-3.1b1.tar.gz
tar xzf hmmer-3.1b1.tar.gz
cd hmmer-3.1b1/
./configure --prefix=/usr && make && make install
Install Aragorn:
cd /mnt
curl -O http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/aragorn1.2.36.tgz
tar -xvzf aragorn1.2.36.tgz
cd aragorn1.2.36/
gcc -O3 -ffast-math -finline-functions -o aragorn aragorn1.2.36.c
cp aragorn /usr/local/bin
Install Prodigal:
cd /mnt
curl -O http://prodigal.googlecode.com/files/prodigal.v2_60.tar.gz
tar xzf prodigal.v2_60.tar.gz
cd prodigal.v2_60/
make
cp prodigal /usr/local/bin
Install tbl2asn:
cd /mnt
curl -O ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/tbl2asn/linux64.tbl2asn.gz
gunzip linux64.tbl2asn.gz
mv linux64.tbl2asn tbl2asn
chmod +x tbl2asn
cp tbl2asn /usr/local/bin
Install GNU Parallel:
cd /mnt
curl -O http://ftp.gnu.org/gnu/parallel/parallel-20130822.tar.bz2
tar xjvf parallel-20130822.tar.bz2
cd parallel-20130822/
ls
./configure && make && make install
Install Infernal:
cd /mnt
curl -O http://selab.janelia.org/software/infernal/infernal-1.1rc4.tar.gz
tar xzf infernal-1.1rc4.tar.gz
cd infernal-1.1rc4/
ls
./configure && make && make install
Download an E. coli assembly (this is the one produced by Velvet for k=41 in Basic (single-genome) assembly):
cd /mnt
mkdir annot
cd annot
curl -O http://athyra.idyll.org/~t/ecoli-v41.fa
And ... finally, run Prokka on the downloaded file!
../prokka-1.7/bin/prokka ecoli-v41.fa --outdir ecoli0104 --prefix ecoli0104 --force
This will produce a bunch of files in a directory named ‘ecoli0104’.
The ecoli0104.faa
file will contain the predicted & annotated
proteins, while the ecoli0104.fna
file contains the original
contigs. This directory contains all of the files necessary to submit
the genome to NCBI, too.
To look at the .faa, try:
head ecoli0104/ecoli0104.faa