Online citations, reference lists, and bibliographies.
← Back to Search

MUSCLE: Multiple Sequence Alignment With High Accuracy And High Throughput.

R. Edgar
Published 2004 · Biology, Medicine

Cite This
Download PDF
Analyze on Scholarcy
Share
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
This paper references
10.1016/S0022-2836(02)01371-2
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
R. Sadreyev (2003)
Gapped BLAST and PSI-BLAST: A new
D. Lipman (1997)
10.1093/bioinformatics/14.2.157
Rose: generating sequence families
J. Stoye (1998)
Alignment of Genomic DNA LAGAN and Multi-LAGAN : Efficient Tools for Large-Scale Multiple
M. Brudno (2003)
10.1093/OXFORDJOURNALS.MOLBEV.A040454
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
N. Saitou (1987)
10.1073/PNAS.95.11.5857
SMART, a simple modular architecture research tool: identification of signaling domains.
J. Schultz (1998)
10.2307/2279372
The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance
M. Friedman (1937)
10.1093/BIOINFORMATICS/BTG1029
APDB: a novel measure for benchmarking sequence alignment methods without reference alignments
O. O'Sullivan (2003)
10.1093/nar/28.1.231
SMART: a web-based tool for the study of genetically mobile domains
J. Schultz (2000)
10.1007/3-540-44696-6_2
Improving Profile-Profile Alignments via Log Average Scoring
Niklas von Öhsen (2001)
Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains
K. Sjölander (1998)
10.1093/NAR/GKH180
Local homology recognition and distance measures in linear time using compressed amino acid alphabets.
R. Edgar (2004)
COACH: pro®le-pro®le alignment of protein families using hidden Markov models
R C Edgar (2004)
10.1093/PROTEIN/8.7.647
Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.
N. Boutonnet (1995)
10.1093/bioinformatics/bth090
A comparison of scoring functions for protein sequence profile alignment
R. Edgar (2004)
10.1093/NAR/GKG111
NCBI Reference Sequence Project: update and current status
K. Pruitt (2003)
10.1006/JMBI.2000.4042
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
C. Notredame (2000)
10.1089/106652701446152
Structure Comparison and Structure Patterns
I. Eidhammer (2000)
10.1016/0001-8708(76)90202-4
Some Biological Sequence Metrics
M. Waterman (1976)
10.1093/bioinformatics/11.1.13
Comprehensive study on iterative algorithms of multiple sequence alignment
Makoto Hirosawa (1995)
10.1093/nar/29.1.323
BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
A. Bahr (2001)
10.1093/BIOINFORMATICS/15.1.87
BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
J. Thompson (1999)
10.1093/nar/27.1.229
SMART: identification and annotation of domains from signalling and extracellular protein sequences
C. Ponting (1999)
Significant improvement in accuracy of multiple se-quence alignments by iterative refinement as asse
O. Gotoh (1996)
10.1093/bioinformatics/12.4.327
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
K. Sjölander (1996)
10.1007/BF02257378
The alignment of sets of sequences and the construction of phyletic trees: An integrated method
P. Hogeweg (2005)
Clustal w: improving the sensitivity of progressive multiple alignment through sequence weighting
J. D. Thompson (1994)
10.1007/978-1-4020-6754-9_15783
SMART, a simple modular architecture research tool
J. Schultz (1998)
10.1089/cmb.1994.1.337
On the Complexity of Multiple Sequence Alignment
L. Wang (1994)
10.1517/14622416.3.1.131
Recent progress in multiple sequence alignment: a survey.
Cédric Notredame (2002)
10.1093/nar/28.1.254
The ASTRAL compendium for protein structure and sequence analysis
S. Brenner (2000)
COACH : pro ® lepro ® le alignment of protein families using hidden Markov models
R. C. Edgar (2004)
Rose: generating sequence
J. Stoye (1998)
10.1093/bioinformatics/18.3.494
MaxBench: evaluation of sequence and structure comparison methods
Raphaël Leplae (2002)
10.1093/nar/26.1.316
Touring protein fold space with Dali/FSSP
L. Holm (1998)
10.1093/NAR/29.14.2994
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
A. Schäffer (2001)
10.1093/NAR/GKF436
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
K. Katoh (2002)
10.1093/OXFORDJOURNALS.MOLBEV.A003985
Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method.
Tobias Müller (2002)
SCOP: a structural classi cation of proteins database
T. Hubbard (1995)
Dirichlet mixtures: a method for improved detection of weak but signi®cant protein sequence
K. Sjolander (1996)
10.1093/bioinformatics/bth116
Align-m-a new algorithm for multiple alignment of highly divergent sequences
I. V. Walle (2004)
10.1002/(SICI)1097-0134(1999)37:3+<209::AID-PROT27>3.0.CO;2-Y
CAFASP‐1: Critical assessment of fully automated structure prediction methods
D. Fischer (1999)
10.1093/bioinformatics/bth091
COACH: profile-profile alignment of protein families using hidden Markov models
R. Edgar (2004)
10.1093/PROTEIN/11.9.739
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.
I. Shindyalov (1998)
10.2307/2289210
The neutral theory of molecular evolution.
M. Kimura (1979)
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
Large‐scale comparison of protein sequence alignment algorithms with structure alignments
J. Michael Sauder (2000)
10.1093/NAR/27.13.2682
A comprehensive comparison of multiple sequence alignment programs
J. Thompson (1999)



This paper is referenced by
10.1111/1574-6968.12001
Characterization of rhizobial isolates nodulating Millettia pinnata in India.
Abdul Rasul (2012)
10.18388/abp.2016_1333
StructAnalyzer - a tool for sequence versus structure similarity analysis.
Jakub Wiedemann (2016)
10.1093/bioinformatics/btx018
Large‐scale 16S gene assembly using metagenomics shotgun sequences
Feng Zeng (2017)
10.3732/ajb.1400316
Phylogenetic relationships in Asarum: Effect of data partitioning and a revised classification.
Brandon T. Sinn (2015)
10.1101/302075
A Whole-Plant Monocot from the Early Cretaceous
Z. Liu (2018)
10.1101/552430
De novo diploid genome assembly for genome-wide structural variant detection
L. Zhang (2019)
10.1016/j.meegid.2010.05.011
Clonal origin of emerging populations of Ehrlichia ruminantium in Burkina Faso.
H. Adakal (2010)
10.1101/809541
Large DNA virus promoted the endosymbiotic evolution to make a photosynthetic eukaryote
M. Matsuo (2019)
10.1099/ijs.0.028258-0
Streptomyces sundarbansensis sp. nov., an actinomycete that produces 2-allyloxyphenol.
Meyyappan Arumugam (2011)
10.1073/pnas.1402926111
Structural basis for the recruitment of glycogen synthase by glycogenin
E. Zeqiraj (2014)
10.1007/s00705-014-2050-2
Establishment of three new genera in the family Geminiviridae: Becurtovirus, Eragrovirus and Turncurtovirus
A. Varsani (2014)
10.1177/1753425916647471
Evolution of CCL11: genetic characterization in lagomorphs and evidence of positive and purifying selection in mammals
F. Neves (2016)
10.1534/genetics.116.195024
Zinc Cluster Transcription Factors Alter Virulence in Candida albicans
Luca Issi (2016)
10.1093/mollus/eyy003
Genetic and morphometric rediscovery of an extinct land snail on oceanic islands
T. Hirano (2018)
10.1101/583542
NRT2.1 phosphorylation prevents root high affinity nitrate uptake activity in Arabidopsis thaliana
Aurore Jacquot (2019)
10.1101/646000
Disease Resistance Genetics and Genomics in Octoploid Strawberry
C. Barbey (2019)
10.1186/1471-2164-15-950
Comprehensive characterization and RNA-Seq profiling of the HD-Zip transcription factor family in soybean (Glycine max) during dehydration and salt stress
Vikas Belamkar (2014)
10.1186/s13068-018-1058-3
Uncovering the molecular mechanisms of lignocellulose digestion in shipworms
F. Sabbadin (2018)
10.1016/J.CHEMPHYS.2012.12.015
Photo-dynamics and thermal behavior of the BLUF domain containing adenylate cyclase NgPAC2 from the amoeboflagellate Naegleria gruberi NEG-M strain
A. Penzkofer (2013)
Advances in Microbial Genomics in the Post-Genomics Era
A. Ali (2018)
10.1016/J.BIOCON.2016.10.003
Population genomic data delineate conservation units in mottled ducks (Anas fulvigula)
J. L. Peters (2016)
10.1038/s41467-019-11136-6
Temperature and insulin signaling regulate body size in Hydra by the Wnt and TGF-beta pathways
Benedikt Mortzfeld (2019)
10.1093/nar/gkr939
LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes
Jun Li (2012)
Cowpea mild mottle virus ON Bemisia tabaci
Felipe Vigato (2014)
10.1016/j.gene.2015.02.062
Genome-wide comparative analysis of ABC systems in the Bdellovibrio-and-like organisms.
N. Li (2015)
An in silico assessment of molecular phylogenetic affinities of Laevicaulis alte (Gastropoda: Systellommatophora) as determined by partial mitochondrial COI sequences
C. Jena (2019)
10.1016/J.GENE.2012.01.100
ESTs library from embryonic stages reveals tubulin and reflectin diversity in Sepia officinalis (Mollusca — Cephalopoda).
Y. Bassaglia (2012)
10.1007/s10482-014-0327-9
454 Pyrosequencing reveals diversity of Bdellovibrio and like organisms in fresh and salt water
N. Li (2014)
10.1111/j.1462-2920.2009.02075.x
Diversity and population structure of sewage-derived microorganisms in wastewater treatment plant influent.
S. L. McLellan (2010)
10.1038/s41467-019-10568-4
Convergent allostery in ribonucleotide reductase
W. C. Thomas (2019)
10.3389/fmicb.2017.02351
An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes
Clémentine Henri (2017)
10.1016/j.tig.2009.03.006
How to become a parasite - lessons from the genomes of nematodes.
C. Dieterich (2009)
See more
Semantic Scholar Logo Some data provided by SemanticScholar