Online citations, reference lists, and bibliographies.
← Back to Search

A Comparison Of Scoring Functions For Protein Sequence Profile Alignment

R. Edgar, K. Sjölander
Published 2004 · Computer Science, Medicine

Cite This
Download PDF
Analyze on Scholarcy
MOTIVATION In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST. RESULTS We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments. AVAILABILITY Source code, reference alignments and more detailed results are freely available at
This paper references
Fold predictions for bacterial genomes.
K. Pawłowski (2001)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
J. Thompson (1994)
Dayhoff,M.O. (ed.),Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC
M. O. Dayhoff (1978)
Combination of threading potentials and sequence profiles improves fold recognition.
A. Panchenko (2000)
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.
G. Yona (2002)
COACH: profile-profile alignment of protein families using hidden Markov models
R. Edgar (2004)
Protein sequence alignment reliability: prediction and measurement
M. Cline (2000)
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.
I. Shindyalov (1998)
Atlas of protein sequence and structure
M. A. Chang (1965)
Amino acid substitution matrices from protein blocks.
S. Henikoff (1992)
Basic local alignment search tool.
S. Altschul (1990)
Increased coverage of protein families with the Blocks Database servers
J. Henikoff (2000)
Position-based sequence weights.
S. Henikoff (1994)
Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains
K. Sjölander (1998)
Hidden Markov models for detecting remote protein
K. Karplus (1998)
Hidden Markov models in computational biology. Applications to protein modeling.
A. Krogh (1994)
Dirichlet mixtures: a method for improving detection of weak but significant protein sequence homology, Compu.Appl.Biosci
K. Sjolander (1996)
The Helicobacter pylori genome: From sequence analysis to structural and functional predictions
K. Pawlowski (1999)
The Protein Data Bank
H. Berman (2000)
Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.
R. Tatusov (1994)
SATCHMO : simultaneous alignment and tree construction using hidden Markov models ( 2003 )
R. C. Edgar
Finding weak similarities between proteins by sequence profile comparison.
A. Panchenko (2003)
Predicting reliable regions in protein sequence alignments
M. Cline (2002)
Profile scanning for three-dimensional structural patterns in protein sequences
M. Gribskov (1988)
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
Large - scale comparison of protein sequence alignments with structure alignments
I. N. Shindyalov (2000)
SATCHMO: simultaneous alignment and tree construction using hidden Markov models
R C Edgar (2003)
Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction
Niklas von Öhsen (2003)
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
R. Sadreyev (2003)
NCBI Reference Sequence Project: update and current status
K. Pruitt (2003)
The total sequence weight relative to the priors (the estimated number of distinct observed sequences, corrected for over-represented subfamilies) was derived using a method from the SAM package
t al (1996)
Hidden Markov models in computational biology. Applications to protein modeling.J
A. Krogh (1994)
Gapped BLAST and PSI-BLAST: A new
D. Lipman (1997)
An improved algorithm for matching biological sequences.
O. Gotoh (1982)
SATCHMO: simultaneous alignment and tree construction using hidden
R. C. Edgar (2003)
Large - scale comparison of protein sequence alignments with structure alignments
J. M. Sauder (2000)
Gapped BLAST and PSIBLAST: a new generation of protein database search programs
S. F. Altschul (1997)
Searching databases of conserved sequence regions by aligning protein multiple-alignments.
S. Pietrokovski (1996)
Mapping the Protein Universe
L. Holm (1996)
Improving Profile-Profile Alignments via Log Average Scoring
Niklas von Öhsen (2001)
Reconstructing history with amino acid sequences
R. Doolittle (1992)
Rapid and sensitive sequence comparison with FASTP and FASTA , Meth
S. Pietrokovski (1990)
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
K. Sjölander (1996)
Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC, 5 Suppl
M. O. Dayhoff (1978)
Identification of common molecular subsequences.
T. Smith (1981)
Large‐scale comparison of protein sequence alignment algorithms with structure alignments
J. Michael Sauder (2000)
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
S. Brenner (1998)
Amino acid substitution matrices.
S. Henikoff (2000)
Metrics and Similarity Measures for Hidden Markov Models
R. B. Lyngsø (1999)
Comparison of sequence profiles. Strategies for structural predictions using sequence information
L. Rychlewski (2000)
A comprehensive comparison of multiple sequence alignment programs
J. Thompson (1999)
Improving profile-profile alignment via log average scoring, In Gascuel,O. and More,B.M.E
N. von Öhsen (2001)
Hidden Markov models for sequence analysis: extension and analysis of the basic method
R. Hughey (1996)
Hidden Markov models for detecting remote protein homologies
K. Karplus (1998)
Rapid and sensitive sequence comparison with FASTP and FASTA.
W. Pearson (1990)
COACH : profileprofile alignment of protein families using hidden Markov models
R. C. Edgar

This paper is referenced by
Evaluation performance of substitution matrices, based on contacts between residue terminal groups
B. Vishnepolsky (2012)
Comparative Protein Structure Modeling Using MODELLER
B. Webb (2014)
Automated structural annotation of the malaria proteome and identification of candidate proteins for modelling and crystallization studies
Yolandi Joubert (2008)
Homology-extended sequence alignment
V. Simossis (2005)
An alternative model of amino acid replacement
G. Crooks (2005)
Paralelização da ferramenta de alinhamento de sequências MUSCLE para um ambiente distribuído
Evandro A. Marucci (2009)
Variable gap penalty for protein sequence-structure alignment.
M. S. Madhusudhan (2006)
The Identification and Characterisation of Microbes in Complex Environments
Alexander L. B. Leach (2013)
Special Issue Papers The many faces of sequence
S. Batzoglou (2005)
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment
Erik Wright (2015)
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
R. Edgar (2004)
Units: Universal True SDSA (Structure-Dependent Sequence Alignment)
Scott G. Foy (2014)
Comparative Protein Structure Modeling Using Modeller
N. Eswar (2006)
Comparison of Methods Used for Aligning Protein Sequences
Sangeetha Madangopal (2006)
UniAlign: protein structure alignment meets evolution
Chunyu Zhao (2015)
Machine Learning for Protein Function
Dan Ofer (2016)
SIMPRO: simple protein homology detection method by using indirect signals
I. Jung (2009)
Examining Phylogenetic Reconstruction Algorithms
E. Albright (2014)
Conotoxin protein classification using free scores of words and support vector machines
Nazar Zaki (2010)
sgRNA Design for Editing rs of a Gene Family Using tem on Lahav
Gal Hyams (2018)
Incremental window-based protein sequence alignment algorithms
H. Rangwala (2007)
From Sequence to Structure And Back Again: An Alignment Tale
V. Simossis (2005)
Protein sequence alignment with family-specific amino acid similarity matrices
I. B. Kuznetsov (2011)
The many faces of sequence alignment
S. Batzoglou (2005)
DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles
P.D. Yoo (2008)
Mapping dynamic programming algorithms on graphics processing units
M. Hanif (2014)
Calculating the structure-based phylogenetic relationship of distantly related homologous proteins utilizing maximum likelihood structural alignment combinatorics and a novel structural molecular clock hypothesis
Scott G. Foy (2013)
Octopus skin ‘sight’ and the evolution of dispersed, dermal light sensing in Mollusca
M. D. Ramirez (2017)
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
R. Edgar (2004)
Comparative Modeling of Drug Target Proteins☆
B. Webb (2014)
Multiple sequence alignment for phylogenetic purposes
D. Morrison (2006)
fRMSDPred: predicting local RMSD between structural fragments using sequence information.
H. Rangwala (2007)
See more
Semantic Scholar Logo Some data provided by SemanticScholar