6  Scenario 3: Annotating a Metagenomic Contig

6.1 Introduction

You are working in a research group specialising in drug discovery. You have been investigating marine microbial communities using metagenomics in a search for sources of novel antibiotic candidates. You have carried out shotgun genome sequencing of an underwater microbial community and assembled a number of interesting contigs that likely contain novel genes. Now you need to annotate them. Today, you’re looking to annotate the sequence at this link.

Figure 6.1: A seabed much like the one you’ve been investigating. Photo credit: Petr Kratochvil CC0
Download the metagenome contig now.

Be sure to download the metagenome contig to your computer so that you can use it with NCBI BLAST to obtain a preliminary functional annotation.

To annotate your contig, you’re going to use NCBI BLAST’s database of model organisms to obtain a first draft annotation, to see if there might be some interesting functions present.

Note

We would normally use specialised genome annotation software tools to annotate sequences like this, but we can illustrate the principles of genome annotation, and the advanced use of BLAST, with an exercise like this.

Your assignment

To annotate putative genes on this contig, you will BLAST its sequence against reference databases at NCBI. Carry out this search using the guide below (using the hints if you need them) and answer the questions in the formative quiz on MyPlace.

6.2 Analysis Steps

Follow the steps below to carry out the analysis for this scenario. If you need a reminder, please check with the example search in the BLAST introduction chapter. If you need more help, the green boxes below expand to give you a hint for what to do, and the orange boxes expand to give a more direct instruction.

6.2.2 Select the BLAST Tool

Your query sequence is a contig - a nucleotide sequence - and you want to identify protein sequences that have functional annotations and are similar to regions on the contig, so this is a (translated) nucleotide vs protein search.

For a reminder, see Table 3.1

This is a (translated) nucleotide vs protein search, so use the BLASTX tool.

Figure 6.2: BLASTX image

6.2.3 Upload the query sequence

Click on the Browse… button.

Figure 6.3: The empty BLASTX query field, showing the Browse… button

Navigate to where you saved the metagenome contig file, and select it.

Figure 6.4: The BLASTX query field, with the contig sequence loaded

6.2.4 Set appropriate parameter choices

The NCBI BLAST webserver provides a database of sequences from model organisms.

The model organism database is small and non-redundant (no repeated sequences), and contains proteomes from a wide taxonomic range of organisms, making it ideal for preliminary annotation. The small size of this dataset means that results are returned quickly and concisely.

The NCBI BLAST webserver allows you to query against a small non-redundant database derived from model organisms (the landmark database). You should use this for preliminary annotation with BLASTX.

Select the Model Organisms (landmark) database

Figure 6.5: NCBI BLAST Model Organisms (landmark) database

6.2.6 Interpret the BLAST report (MyPlace Questions)

Your metagenomic contig likely contains multiple genic or protein-coding regions. The matches to these will all be present in the report at the same time, so you will need to approach the interpretation of this report differently from those in scenarios 1 and 2. You may find it useful to start your interpretation by looking at the Graphic Summary tab.

Important

Please answer the questions below in the formative quiz on MyPlace

Clicking on the green box should give you a hint to the answer, or where to find it.

Check the Graphic Summary tab, and count the number of distinct regions of your query with matches in the database.

Check the Graphic Summary tab, and count the number of distinct regions of your query with matches in the database, and the appropriate colour of annotation in the graphic.

Check the Descriptions tab and identify the top hit.

Check the Graphic Summary tab, and click on the topmost alignment at the appropriate region. Click on the Alignment link in the box that appears.

Check the Graphic Summary tab, and click on the topmost alignment at the appropriate region. Click on the Alignment link in the box that appears.

Check the Graphic Summary tab, and click on the topmost alignment at the appropriate region. Click on the Alignment link in the box that appears.

Check the Taxonomy tab, and make a judgement based on the Score and the number of hits in the report.

6.3 Stretch Activities

These activities are not necessary for the assessment, but use more features of the NCBI services to provide more information relevant to interpreting your BLAST results. Open the orange boxes to see the questions, and use the green boxes to find some answers.

6.4 Summary

Well done!

After successfully working through this scenario, you should be able to

  • use BLAST to obtain preliminary identifications for a section of genome
  • identify the likely taxonomic origin of a genomic sequence using BLAST

If you completed the stretch activities you should be able to

  • modify your search parameters for the same query, choosing the most appropriate database with or without taxon filters to answer a specific biological question
  • explain how the choice of database affects the accuracy and comprehensiveness of your results
  • use BLAST results at NCBI to obtain domain and structural information relevant to your protein
  • use the NCBI BLAST service to help visualise structures of homologues to your protein