Molecular Evolutionary Genetics Analysis (MEGA)

Abstract The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multi-gene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution.

In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices.

MEGA is an integrated workbench for biologists for mining data from the web, aligning sequences, conducting phylogenetic analyses, testing evolutionary hypothesis and generating publication quality displays and descriptions.

Note: A survey of research papers citing MEGA reveals that this software is used in diverse disciplines, including AIDS/HIV research, virology, bacteriology and general disease, plant biology, conservation biology, systematics, developmental evolution, and population genetics.

MEGA features/capabilities include:

Input Data -- DNA, Protein, Pairwise distance matrix.

Sequence Alignment Construction --

Alignment Editor - Manual editing of DNA and Protein sequences; Motif searching/highlighting; Synchronous alignment editing of original and translated cDNA;

Copy/Paste sequences To/From Clipboard; Save alignment session for future display; Ability to read sequencer, MEGA, NEXUS, FASTA, and other formats;

Apply color/highlight schemes to sequence data; Write alignment to MEGA file for direct analysis in MEGA; and BLAST sequences from alignment directly.

Multiple Sequence Alignment - Complete native implementation of ClustalW; Ability to select all options on the fly; Ability to align any user- selected region; and Ability to align translated cDNA sequences and automatic adjustment.

Sequencer (Trace) File editor/viewer - View ABI (*.abi, .ab1) and Studfen (*.std?); Edit trace file; Mask vector (or any other region); Launch direct BLAST search for whole or selected sequence; and Send data directly to Alignment Editor.

Integrated Web Browser and Sequence Fetching - Direct "usual" web and GenBank browsing from MEGA; One-click sequence fetching from databanks queries; Send sequence data from BLAST search directly into alignment; and Bookmark favorite sequence databank sites.

Data Handling --

Handling ambiguous states (R,Y,T, etc.); Extended MEGA format to save all data attributes; and Importing Data from other formats (Clustal/Nexus/etc.).

Data Explorers - Sequence and Distance matrix.

Attributes supported - Groups of Sequences/Taxa; Domains; Genes and Mixed Domain attributes; Explicit labels for sites; Automatic codon translation; Selection of codon positions; and Selection of different site categories.

Visual Specification of Domains/Groups; Center Analysis Preferences Dialog; and Unlimited Data size for Analysis.

Genetic Code Table Selection --

Choose a desired table and Ability to add/edit user defined tables.

Computation of statistical attributes of a code table - Degeneracy of codon positions and Numbers of potential synonymous sites.

Inclusion of all known code tables.

Real-Time Caption Expert Engine -- Generate Captions for Distance Matrices; Generate Captions for Phylogenies; Generate Captions for Tests; Generate Captions for Alignments; Copy Captions to External Programs; and Save/Print Captions.

Integrated Text File Editor -- Unlimited Text File Size; Multi-file Tabbed Display; Columnar Block selection/Editing; Undo/Redo operations; Line numbers; Utilities to Format Sequences/Reverse complement etc; and Copy Screenshots to EMF/WMF/Bitmap for presentation.

Sequence Data Viewer --

Two dimensional display of molecular sequences; Display with identity symbol; Drag-drop sorting of sequences; Mixing coding and non-coding sequence display; One-click translation; and Display with all or only selected taxa.

Data Export - PAUP3, PHYLIP and PAUP4, PHYLIP Interleaved.

Highlighting - 0,2,4-fold degenerate sites; Variable, parsimony informative sites; and Constant Sites.

Statistical Quantities estimation -

DNA and protein sequence compositions.

Estimation by genes/domains/groups - Codon Usage.

Estimation by genes/domains/groups - Use only highlighted sites.

MCL-based Estimation of Nucleotide Substitution Patterns --

4x4 Rate Matrix; Transition/Transversion Rate Ratios (k1, k2); and Transition/Transversion Rate Bias (R).

Substitution Pattern Homogeneity Test -- Composition Distance; Disparity Index; and Monte-Carlo Test.

Distance Estimation Methods --

Nucleotide-by-Nucleotide -

Models - No. of differences, p-distance, Jukes-Cantor, Kimura 2P; Tajima-Nei, Tamura 3-parameter, Tamura-Nei distance; LogDet (Tamura-Kumar); and Maximum Composite Likelihood.

Subcomponents - Transitions (ts), tranversions (tv), ts/tv ratio and Number of common sites.

Account for rate variation among sites and Relaxation of the homogeneity assumption.

Synonymous/Non-synonymous (Codon-by-Codon) -

Models - Nei-Gojobori (1986) method; Modified Nei-Gojobori method; Li- Wu-Lou, PBL, Kumar method.

Subcomponents - Synonymous (s), non-synonymous (n) distances; Numbers of synonymous and non-synonymous sites; Differences and ratios (s-n, n-s, s/n, and n/s); 4-fold degenerate site distances; 0-fold degenerate site distances; and Number of 0-fold and 4-fold degenerate sites.

Protein distance - Number of differences, p-distance, Poisson; Dayhoff and JTT distances; Account for rate variation among sites; and Relaxation of the homogeneity assumption.

Distance Calculations - Pairwise; Between Group Average; Within Group Average; Net between group Average; and Overall average.

Sequence Diversity Calculations - Mean Diversity within Subpopulations; Mean Diversity for Entire Population; Mean Inter- populational Diversity; and Coefficient of Differentiation.

Variance Calculations - Analytical and Bootstrap.

Handling missing data; Automatic translation; and Automatic pasting of partial codons between exons.

Tests of Selection --

Codon-based tests -

Large sample Z-test - Between Sequences; Within groups; and Overall sequences.

Fisher's Exact Test and Tajima's Test of Neutrality.

Molecular Clock Test -- Tajima's relative rate test.

Tree-making Methods --

Neighbor-Joining - Randomized tie-breaking in bootstrapping.

Minimum Evolution method - Branch-swapping (Close-Neighbor- Interchange; CNI) and Fast OLS computation method.

UPGMA - Randomized tie-breaking in bootstrapping.

Maximum Parsimony - Nucleotide sequences; Protein sequences; Max- mini branch-and-bound and min-mini searches; Branch-swapping (CNI); and Average branch length estimation.

Bootstrap Test of Phylogeny - Neighbor-joining/UPGMA; Minimum Evolution; and Maximum Parsimony.

Confidence Probability Test - Neighbor-joining and Minimum Evolution.

Consensus tree construction and Condensed tree construction.

Distance Matrix Viewer --

View pairwise distances; View between group distances; View within group distances; and View distances and standard errors simultaneously.

Sort the distance matrix - Drag-and-drop; Group-wise; and By Sequence names.

Control display precision and Export Data for printing or re-importing. Tree Explorers --

Phylogeny Display and Graphic printing; On-the-spot taxa name editing; Multiple phylogeny views; Linearized Tree; Estimation of divergence time by calibrating molecular clock; Copy to Clipboard/save to file as an EMF drawing; Save to Newick format; and Read trees from Newick format.

User specified control for - Placement and precision of branch length; Scale bar addition; Collapsing branches or groups; Display only a sub- tree; and Ability to view multiple trees in different viewers.

Tree Editing - Flipping, re-rooting; Add marker symbols to names; and Multi-color display and printing.

Change Tree Size - Vertical separation between taxa; Horizontal size; and Change Tree shape.

Multiple tree display; Save tree session for future display; What you see is what you get printing (wysiwygp); Multi- or single page printing; and Display images on tree for groups and taxa.

MEGA 4 New Features include:

Real-Time Caption Expert Engine -- A unique facility to generate detailed captions for different types of analyses and results. These captions are intended to provide detailed, natural language descriptions of the methods and models used in analysis.

Maximum Composite Likelihood Method -- A method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This method can also be used to estimate transition/transversion biases and nucleotide substitution patterns without requiring a priori knowledge of the phylogenetic tree.

MEGA comes with on-line help outlining the different aspects of its user- interface. Extensive details of the statistical and computational methods available in MEGA are presented in the book 'Molecular Evolution and Phylogenetics' (Nei and Kumar, Oxford University Press, 2000).

System Requirements

Windows 95/98, NT, 2000, XP, and Vista.

Linux Version MEGA will run efficiently in the Linux desktop environment on top of Wine, an open-source compatibility layer for running Windows programs on Unix-based Operating Systems.


