Assembly Scavenger HuntΒΆ

This exercise is meant to bring together knowledge from the whole week, and also just be fun. I’ve taken some pain text, embedded it in DNA with a simple algorithm along with some random sequence, and fragmented it to produce reads. Your job will be to assemble the reads and put the results back through the script to retrieve the original message.

We’ll need a few things:

cd /mnt
mkdir scavenger-hunt
cd scavenger-hunt

curl -O http://athyra.idyll.org/~cswelcher/assembly-scavenger-hunt/reads/reads.svZjxD/scavenger_reads.fa
curl -O http://athyra.idyll.org/~cswelcher/dna2text.py
curl -O http://athyra.idyll.org/~cswelcher/dnatextutils.py

You should have velvet already, but if not:

cd /root
curl -O http://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
tar xzf velvet_1.2.10.tgz
cd velvet_1.2.10
make MAXKMERLENGTH=51
cp velvet? /usr/local/bin

cd /mnt/scavenger-hunt

Of course, sequencing chemistry is always improving, so you may want to use:

curl -O http://athyra.idyll.org/~cswelcher/assembly-scavenger-hunt/reads/reads.Q7XSSZ/scavenger_reads.fa

You’ll want to use velvet to assemble scavenger_reads.fa. They’re 36-base singled-ended reads, and like an actual metagenome, have variable coverage. This means that you might need to do some parameter exploration to get the contigs you want out of it; I would recommend looking at the exp_cov parameter of velvetg in particular.

To decode your results, make use of the dna2text.py script. It’s usage is:

python dna2text.py contigs.fa > contigs.text

Which you can then look at with less:

less contigs.text

Further, you might want to make use of an ipython notebook which plots k-mer abundance distributions, and could help you with parameters:

curl http://2013-caltech-workshop.readthedocs.org/en/latest/_static/caltech-2013-scavenger-hunt.ipynb > /usr/local/notebooks/2013-caltech-scavenger-hunt.ipynb

Which you can then access by going to https://ec2-???????????.compute-1.amazonaws.com.

Happy hunting!

comments powered by Disqus