8  Map Reads to the Genome

Read mapping is a common bioinformatics technique that aligns sequencing reads to a reference genome. It is often used as part of RNA-seq, or transcriptomic, analyses, where the number of sequencing reads aligned to a part of the genome indicates the level to which a gene in that region is transcribed. It is also central to technologies such as ChIP-seq (chromatin immunoprecipitation and sequencing), which enables the locations of regulatory promoter binding sites to be found.

Today, you will use the same approach to compare the ERR531380 isolate that you have sequenced to a reference Pseudomonas aeruginosa PAO1 genome, and then to an assembly of the ERR53413 isolate sequenced as part of this study. To do this you will use the BWA-MEM2 read mapping software (Vasimuddin et al. (2019)) to align the cleaned forward and reverse reads from your isolate to each of the two reference genomes.

Good practice

When reporting read mapping in a manuscript or dissertation, you should always state:

  1. The software tool you used, with its version number and a citation of the paper describing it (if available; provide a URL to the software if there is no paper)
  2. The parameters used when running the tool (if default parameters were used, state this)
  3. The accession numbers of the reference genome and read data (if available)
How BWA-MEM2 works

BWA-MEM2 is named after the Burrows-Wheeler Algorithm, a mathematical technique that was originally used for lossless data compression (similar to a .zip file). In a bioinformatics context, the algorithm helps make sequence alignment much faster because “compressing” the reference sequence into an index means that alignment to redundant short sequences only needs to take place once, rather than many times.

Fast algorithms like BWA-MEM2 for use with large datasets often have two steps: generation of a reference index, and then comparison of the input data against the index.

8.1 Using BWA-MEM2

  1. Navigate to the BWA-MEM2 tool
  2. Select the BWA-MEM2 tool
  3. Choose Use a genome from history and build index under Will you select a reference genome…
  4. Choose the GCF_000006765.1_ASM676v1_genomic.fna sequence under Use the following dataset as the reference sequence
  5. Choose Paired under Single of Paired-end reads
  6. Select the (R1 paired) and (R2 paired) trimmomatic output reads
  7. Click Run Tool
Video: Mapping ERR513380 reads to a reference genome using BWA-MEM2
Caution

BWA-MEM2 can take a few minutes to run to completion.

8.2 BWA-MEM2 Output

Unlike many of the other tools you will use today, BWA-MEM produces binary file output, which is not human-readable. However, Galaxy can read it and generate a human-friendly translation. The output format of BWA-MEM2 is .bam, and this can be used as an annotation track in a visualisation tool such as JBrowse, wchich you will meet next, in Chapter 9.

Video: Visualising the BWA-MEM2 .bam output as text

8.3 Next Steps

We can find SNPs by visual inspection in a tool like JBrowse, and you will try this in Chapter 9.