Assembly Scavenger HuntΒΆ
This exercise is meant to bring together knowledge from the whole week, and also just be fun. I’ve taken some pain text, embedded it in DNA with a simple algorithm along with some random sequence, and fragmented it to produce reads. Your job will be to assemble the reads and put the results back through the script to retrieve the original message.
We’ll need a few things:
cd /mnt
mkdir scavenger-hunt
cd scavenger-hunt
curl -O http://athyra.idyll.org/~cswelcher/assembly-scavenger-hunt/reads/reads.svZjxD/scavenger_reads.fa
curl -O http://athyra.idyll.org/~cswelcher/dna2text.py
curl -O http://athyra.idyll.org/~cswelcher/dnatextutils.py
You should have velvet already, but if not:
cd /root
curl -O http://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
tar xzf velvet_1.2.10.tgz
cd velvet_1.2.10
make MAXKMERLENGTH=51
cp velvet? /usr/local/bin
cd /mnt/scavenger-hunt
Of course, sequencing chemistry is always improving, so you may want to use:
curl -O http://athyra.idyll.org/~cswelcher/assembly-scavenger-hunt/reads/reads.Q7XSSZ/scavenger_reads.fa
You’ll want to use velvet to assemble scavenger_reads.fa
. They’re 36-base singled-ended
reads, and like an actual metagenome, have variable coverage. This means that you might need
to do some parameter exploration to get the contigs you want out of it; I would recommend
looking at the exp_cov
parameter of velvetg
in particular.
To decode your results, make use of the dna2text.py script. It’s usage is:
python dna2text.py contigs.fa > contigs.text
Which you can then look at with less
:
less contigs.text
Further, you might want to make use of an ipython notebook which plots k-mer abundance distributions, and could help you with parameters:
curl http://2013-caltech-workshop.readthedocs.org/en/latest/_static/caltech-2013-scavenger-hunt.ipynb > /usr/local/notebooks/2013-caltech-scavenger-hunt.ipynb
Which you can then access by going to https://ec2-???????????.compute-1.amazonaws.com.
Happy hunting!