Some exercises to try
=====================
0. Finish or re-run one of the tutorials that you're particularly interested
in.
1. Download and assemble some reads from the Short Read Archive (SRA)
or elsewhere.
Remember that `the ENA at EBI `__ offers
FASTQ downloads of SRA sequence.
(see these tutorials: :doc:`reads_and_qc` and `assembly-lab`)
2. Browse the `NCBI Bacterial genomes site
`__, download a proteome
of interest (you'll want the '.faa' files), and run the various BLASTs
on 'em (e.g. find reciprocal best hits).
See the instructions here: :doc:`tuesday`.
3. Map raw reads for the E. coli 0104 genome back to your assembly --
e.g. take the assembly at http://athyra.idyll.org/~t/ecoli-v41.fa
and map the QC reads to it (see :doc:`bwa_mapping`).
Bonus: load the resulting read mapping into Tablet.
Bonus x 2: use the Prokka annotation .gff file to define features
in Tablet. (If you didn't get all the way through the Prokka run,
you can download a zip file of the results here:
http://athyra.idyll.org/~t/ecoli0104-prokka.zip)
4. Assemble the provided ecoli reads *without* digital normalization.
5. Install BLAST and the ngs-scripts, and then try comparing an assembly
with a genome by using the following commands::
blastall -i assembly.fa -d genome.fa -p blastn -e 1e-20 -o assembly.x.genome.blastn
blastall -d assembly.fa -i genome.fa -p blastn -e 1e-20 -o genome.x.assembly.blastn
python /usr/local/share/ngs-scripts/blast/calc-blast-cover.py genome.fa assembly.x.genome.blastn 300 assembly.fa
python /usr/local/share/ngs-scripts/blast/calc-blast-cover.py assembly.fa genome.x.assembly.blastn 300 genome.fa
6. Try taking one of the assemblies
(e.g. http://athyra.idyll.org/~t/ecoli-v41.fa) and mapping the QC
reads used to assemble it back to it, to assess assembly completeness.