← Back to Search
MUSCLE: Multiple Sequence Alignment With High Accuracy And High Throughput.
Published 2004 · Biology, Medicine
Download PDFAnalyze on Scholarcy
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
This paper references
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
R. Sadreyev (2003)
Gapped BLAST and PSI-BLAST: A new
D. Lipman (1997)
Rose: generating sequence families
J. Stoye (1998)
Alignment of Genomic DNA LAGAN and Multi-LAGAN : Efficient Tools for Large-Scale Multiple
M. Brudno (2003)
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
N. Saitou (1987)
SMART, a simple modular architecture research tool: identification of signaling domains.
J. Schultz (1998)
The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance
M. Friedman (1937)
APDB: a novel measure for benchmarking sequence alignment methods without reference alignments
O. O'Sullivan (2003)
SMART: a web-based tool for the study of genetically mobile domains
J. Schultz (2000)
Improving Profile-Profile Alignments via Log Average Scoring
Niklas von Öhsen (2001)
Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains
K. Sjölander (1998)
Local homology recognition and distance measures in linear time using compressed amino acid alphabets.
R. Edgar (2004)
COACH: pro®le-pro®le alignment of protein families using hidden Markov models
R C Edgar (2004)
Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.
N. Boutonnet (1995)
A comparison of scoring functions for protein sequence profile alignment
R. Edgar (2004)
NCBI Reference Sequence Project: update and current status
K. Pruitt (2003)
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
C. Notredame (2000)
Structure Comparison and Structure Patterns
I. Eidhammer (2000)
Some Biological Sequence Metrics
M. Waterman (1976)
Comprehensive study on iterative algorithms of multiple sequence alignment
Makoto Hirosawa (1995)
BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
A. Bahr (2001)
BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
J. Thompson (1999)
SMART: identification and annotation of domains from signalling and extracellular protein sequences
C. Ponting (1999)
Significant improvement in accuracy of multiple se-quence alignments by iterative refinement as asse
O. Gotoh (1996)
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
K. Sjölander (1996)
The alignment of sets of sequences and the construction of phyletic trees: An integrated method
P. Hogeweg (2005)
Clustal w: improving the sensitivity of progressive multiple alignment through sequence weighting
J. D. Thompson (1994)
SMART, a simple modular architecture research tool
J. Schultz (1998)
On the Complexity of Multiple Sequence Alignment
L. Wang (1994)
Recent progress in multiple sequence alignment: a survey.
Cédric Notredame (2002)
The ASTRAL compendium for protein structure and sequence analysis
S. Brenner (2000)
COACH : pro ® lepro ® le alignment of protein families using hidden Markov models
R. C. Edgar (2004)
Rose: generating sequence
J. Stoye (1998)
MaxBench: evaluation of sequence and structure comparison methods
Raphaël Leplae (2002)
Touring protein fold space with Dali/FSSP
L. Holm (1998)
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
A. Schäffer (2001)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
K. Katoh (2002)
Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method.
Tobias Müller (2002)
SCOP: a structural classi cation of proteins database
T. Hubbard (1995)
Dirichlet mixtures: a method for improved detection of weak but signi®cant protein sequence
K. Sjolander (1996)
Align-m-a new algorithm for multiple alignment of highly divergent sequences
I. V. Walle (2004)
CAFASP‐1: Critical assessment of fully automated structure prediction methods
D. Fischer (1999)
COACH: profile-profile alignment of protein families using hidden Markov models
R. Edgar (2004)
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.
I. Shindyalov (1998)
The neutral theory of molecular evolution.
M. Kimura (1979)
Large‐scale comparison of protein sequence alignment algorithms with structure alignments
J. Michael Sauder (2000)
A comprehensive comparison of multiple sequence alignment programs
J. Thompson (1999)
This paper is referenced by
Characterization of rhizobial isolates nodulating Millettia pinnata in India.
Abdul Rasul (2012)
StructAnalyzer - a tool for sequence versus structure similarity analysis.
Jakub Wiedemann (2016)
Large‐scale 16S gene assembly using metagenomics shotgun sequences
Feng Zeng (2017)
Phylogenetic relationships in Asarum: Effect of data partitioning and a revised classification.
Brandon T. Sinn (2015)
A Whole-Plant Monocot from the Early Cretaceous
Z. Liu (2018)
De novo diploid genome assembly for genome-wide structural variant detection
L. Zhang (2019)
Clonal origin of emerging populations of Ehrlichia ruminantium in Burkina Faso.
H. Adakal (2010)
Large DNA virus promoted the endosymbiotic evolution to make a photosynthetic eukaryote
M. Matsuo (2019)
Streptomyces sundarbansensis sp. nov., an actinomycete that produces 2-allyloxyphenol.
Meyyappan Arumugam (2011)
Structural basis for the recruitment of glycogen synthase by glycogenin
E. Zeqiraj (2014)
Establishment of three new genera in the family Geminiviridae: Becurtovirus, Eragrovirus and Turncurtovirus
A. Varsani (2014)
Evolution of CCL11: genetic characterization in lagomorphs and evidence of positive and purifying selection in mammals
F. Neves (2016)
Zinc Cluster Transcription Factors Alter Virulence in Candida albicans
Luca Issi (2016)
Genetic and morphometric rediscovery of an extinct land snail on oceanic islands
T. Hirano (2018)
NRT2.1 phosphorylation prevents root high affinity nitrate uptake activity in Arabidopsis thaliana
Aurore Jacquot (2019)
Disease Resistance Genetics and Genomics in Octoploid Strawberry
C. Barbey (2019)
Comprehensive characterization and RNA-Seq profiling of the HD-Zip transcription factor family in soybean (Glycine max) during dehydration and salt stress
Vikas Belamkar (2014)
Uncovering the molecular mechanisms of lignocellulose digestion in shipworms
F. Sabbadin (2018)
Photo-dynamics and thermal behavior of the BLUF domain containing adenylate cyclase NgPAC2 from the amoeboflagellate Naegleria gruberi NEG-M strain
A. Penzkofer (2013)
Advances in Microbial Genomics in the Post-Genomics Era
A. Ali (2018)
Population genomic data delineate conservation units in mottled ducks (Anas fulvigula)
J. L. Peters (2016)
Temperature and insulin signaling regulate body size in Hydra by the Wnt and TGF-beta pathways
Benedikt Mortzfeld (2019)
LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes
Jun Li (2012)
Cowpea mild mottle virus ON Bemisia tabaci
Felipe Vigato (2014)
Genome-wide comparative analysis of ABC systems in the Bdellovibrio-and-like organisms.
N. Li (2015)
An in silico assessment of molecular phylogenetic affinities of Laevicaulis alte (Gastropoda: Systellommatophora) as determined by partial mitochondrial COI sequences
C. Jena (2019)
ESTs library from embryonic stages reveals tubulin and reflectin diversity in Sepia officinalis (Mollusca — Cephalopoda).
Y. Bassaglia (2012)
454 Pyrosequencing reveals diversity of Bdellovibrio and like organisms in fresh and salt water
N. Li (2014)
Diversity and population structure of sewage-derived microorganisms in wastewater treatment plant influent.
S. L. McLellan (2010)
Convergent allostery in ribonucleotide reductase
W. C. Thomas (2019)
An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes
Clémentine Henri (2017)
How to become a parasite - lessons from the genomes of nematodes.
C. Dieterich (2009)See more