Mapping and Assembly with Qualities (Maq)

Abstract Maq is a software product that builds ‘mapping assemblies’ from short reads generated by next-generation sequencing machines.

It is particularly designed for Illumina-Solexa™ 1G ‘Genetic Analyzer’, and has preliminary functions to handle Applied Biosystems (ABI) SOLiD™ System data.

Maq first aligns reads to ‘reference sequences’ and then calls the consensus. At the mapping stage, Maq performs ‘ungapped’ alignment.

For single-end reads, Maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for ‘paired-end’ reads, it always finds all ‘paired hits’ with one of the two reads containing up to 1 mismatch.

At the assembling stage, Maq calls the consensus based on a statistical model.

It calls the base which maximizes the posterior probability and calculates a Phred quality (see below...) at each position along the consensus. Heterozygotes are also called in this process.

Phred - Phred is a base-calling program for DNA sequence traces. Phred reads DNA sequence chromatogram files and analyzes the peaks to call bases, assigning quality scores (“Phred scores”) to each base call.

With Maq you can:

1) Fast align Illumina/SOLiD reads to the reference genome. With the default options, ‘one million pairs’ of reads can be mapped to the human genome, in about 10 CPU hours with less than 1G memory.

2) Accurately measure the ‘error probability’ of the alignment of each individual read.

3) Call the ‘consensus genotypes’, including homozygous and heterozygous polymorphisms, with a Phred probabilistic quality assigned to each base.

4) Find ‘short Indels’ (an Indel is either an insertion or deletion mutation in the genetic code) with paired end reads.

5) Accurately find large scale ‘genomic deletions’ and translocations with paired end reads.

6) Discover potential ‘Copy Number Variations’ (CNVs) by checking read depth.

7) Evaluate the accuracy of ‘raw base qualities’ from sequencers and help check for systematic errors.

However, Maq can NOT:

1) Do de novo assembly. (Maq can only call the consensus by mapping reads to a known reference.)

2) Map ‘shorts reads’ against themselves. (Maq can only find complete overlap between reads.)

3) Align capillary reads or 454 reads to the reference. (Maq can Not align reads longer than 63bp.)

M.A.Q. Viewer --

Maqview is ‘graphical read alignment’ viewer. It is specifically designed for the Maq alignment file and allows you to see the mismatches, base qualities and mapping qualities.

Maqview is Not as fancy as Consed (Consed is a program for viewing, editing, and finishing DNA sequence assemblies) or GAP, but just a simple viewer for you to see what happens in a particular region.

According to the manufacturer, Maqview in comparison to tgap-Maq, the text-based read alignment viewer written by James Bonfield, Maqview is faster and takes up much less memory and disk space in indexing.

This may be possible because ‘tgap-Maq’ aims to be a general-purpose viewer but Maqview makes full use of the fact that the Maq alignment file has already been sorted.

Maqview is also efficient in viewing and provides a ‘command-line tool’ to quickly retrieve any region in a Maq alignment file.

Maqview is based on OpenGL. According to the manufacturer, installing OpenGL on your system is a trade-off of getting a better look and feel.

Maqview displays the read alignment in a graphical window. It has two (2) views: ‘Sequence view’ and ‘Box view’.

In the Sequence view, read sequences will be printed on the screen. Darker bases indicate lower base qualities and red ones show the differences in comparison to the ‘majority-rule consensus’ (Not the Maq consensus).

In the Box view, different types of nucleotides are represented as color boxes with green for ‘A’, cyan for ‘C’, orange for ‘G’, red for ‘T’ and dark gray for ‘N’.

The saturation of colors indicates the ‘base qualities’ and the thickness lines of reads shows the mapping qualities of ‘read alignments’. Zooming in/out is supported only in the Box view.

In both views, the ‘status bar’ at the bottom of the window will show some information about the key touches, and read names and base qualities that can be pointed at by the mouse.

Maq documentation --

Maq is documented in three (3) parts: Maq User's Manual, Maq Reference Manual and the FAQ page.

The User's Manual introduces basic functions in Maq, the Reference Manual gives detailed usage of each function and the Wiki site (FAQ page) presents informal but useful tips and notes related to Maq.

These documentations are complementary to each other.

