Online citations, reference lists, and bibliographies.
← Back to Search

A Comparison Of Scoring Functions For Protein Sequence Profile Alignment

R. Edgar, K. Sjölander
Published 2004 · Computer Science, Medicine

Cite This
Download PDF
Analyze on Scholarcy
Share
MOTIVATION In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST. RESULTS We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments. AVAILABILITY Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/
This paper references
10.1006/JSBI.2001.4394
Fold predictions for bacterial genomes.
K. Pawłowski (2001)
10.1093/NAR/22.22.4673
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
J. Thompson (1994)
Dayhoff,M.O. (ed.),Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC
M. O. Dayhoff (1978)
10.1006/JMBI.2000.3541
Combination of threading potentials and sequence profiles improves fold recognition.
A. Panchenko (2000)
10.1006/JMBI.2001.5293
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.
G. Yona (2002)
10.1093/bioinformatics/bth091
COACH: profile-profile alignment of protein families using hidden Markov models
R. Edgar (2004)
Protein sequence alignment reliability: prediction and measurement
M. Cline (2000)
10.1093/PROTEIN/11.9.739
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.
I. Shindyalov (1998)
Atlas of protein sequence and structure
M. A. Chang (1965)
10.1073/PNAS.89.22.10915
Amino acid substitution matrices from protein blocks.
S. Henikoff (1992)
10.1016/S0022-2836(05)80360-2
Basic local alignment search tool.
S. Altschul (1990)
10.1093/nar/28.1.228
Increased coverage of protein families with the Blocks Database servers
J. Henikoff (2000)
10.1016/0022-2836(94)90032-9
Position-based sequence weights.
S. Henikoff (1994)
Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains
K. Sjölander (1998)
Hidden Markov models for detecting remote protein
K. Karplus (1998)
10.1006/JMBI.1994.1104
Hidden Markov models in computational biology. Applications to protein modeling.
A. Krogh (1994)
Dirichlet mixtures: a method for improving detection of weak but significant protein sequence homology, Compu.Appl.Biosci
K. Sjolander (1996)
10.1002/(SICI)1097-0134(19990701)36:1<20::AID-PROT2>3.0.CO;2-X
The Helicobacter pylori genome: From sequence analysis to structural and functional predictions
K. Pawlowski (1999)
10.1093/nar/28.1.235
The Protein Data Bank
H. Berman (2000)
10.1073/PNAS.91.25.12091
Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.
R. Tatusov (1994)
SATCHMO : simultaneous alignment and tree construction using hidden Markov models ( 2003 )
R. C. Edgar
10.1093/NAR/GKG154
Finding weak similarities between proteins by sequence profile comparison.
A. Panchenko (2003)
10.1093/bioinformatics/18.2.306
Predicting reliable regions in protein sequence alignments
M. Cline (2002)
10.1093/bioinformatics/4.1.61
Profile scanning for three-dimensional structural patterns in protein sequences
M. Gribskov (1988)
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
CHRISTUS
Large - scale comparison of protein sequence alignments with structure alignments
I. N. Shindyalov (2000)
SATCHMO: simultaneous alignment and tree construction using hidden Markov models
R C Edgar (2003)
10.1142/9789812776303_0024
Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction
Niklas von Öhsen (2003)
10.1016/S0022-2836(02)01371-2
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
R. Sadreyev (2003)
10.1093/NAR/GKG111
NCBI Reference Sequence Project: update and current status
K. Pruitt (2003)
The total sequence weight relative to the priors (the estimated number of distinct observed sequences, corrected for over-represented subfamilies) was derived using a method from the SAM package
t al (1996)
Hidden Markov models in computational biology. Applications to protein modeling.J
A. Krogh (1994)
Gapped BLAST and PSI-BLAST: A new
D. Lipman (1997)
10.1016/0022-2836(82)90398-9
An improved algorithm for matching biological sequences.
O. Gotoh (1982)
SATCHMO: simultaneous alignment and tree construction using hidden
R. C. Edgar (2003)
Large - scale comparison of protein sequence alignments with structure alignments
J. M. Sauder (2000)
Gapped BLAST and PSIBLAST: a new generation of protein database search programs
S. F. Altschul (1997)
10.1093/NAR/24.19.3836
Searching databases of conserved sequence regions by aligning protein multiple-alignments.
S. Pietrokovski (1996)
10.1126/science.273.5275.595
Mapping the Protein Universe
L. Holm (1996)
10.1007/3-540-44696-6_2
Improving Profile-Profile Alignments via Log Average Scoring
Niklas von Öhsen (2001)
Reconstructing history with amino acid sequences
R. Doolittle (1992)
Rapid and sensitive sequence comparison with FASTP and FASTA , Meth
S. Pietrokovski (1990)
10.1093/bioinformatics/12.4.327
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
K. Sjölander (1996)
Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC, 5 Suppl
M. O. Dayhoff (1978)
10.1016/0022-2836(81)90087-5
Identification of common molecular subsequences.
T. Smith (1981)
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
Large‐scale comparison of protein sequence alignment algorithms with structure alignments
J. Michael Sauder (2000)
10.1073/PNAS.95.11.6073
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
S. Brenner (1998)
10.1016/S0065-3233(00)54003-0
Amino acid substitution matrices.
S. Henikoff (2000)
Metrics and Similarity Measures for Hidden Markov Models
R. B. Lyngsø (1999)
10.1110/PS.9.2.232
Comparison of sequence profiles. Strategies for structural predictions using sequence information
L. Rychlewski (2000)
10.1093/NAR/27.13.2682
A comprehensive comparison of multiple sequence alignment programs
J. Thompson (1999)
Improving profile-profile alignment via log average scoring, In Gascuel,O. and More,B.M.E
N. von Öhsen (2001)
10.1093/bioinformatics/12.2.95
Hidden Markov models for sequence analysis: extension and analysis of the basic method
R. Hughey (1996)
10.1093/bioinformatics/14.10.846
Hidden Markov models for detecting remote protein homologies
K. Karplus (1998)
10.1016/0076-6879(90)83007-V
Rapid and sensitive sequence comparison with FASTP and FASTA.
W. Pearson (1990)
COACH : profileprofile alignment of protein families using hidden Markov models
R. C. Edgar



This paper is referenced by
10.1080/07391102.2012.677769
Evaluation performance of substitution matrices, based on contacts between residue terminal groups
B. Vishnepolsky (2012)
10.1002/0471250953.bi0506s47
Comparative Protein Structure Modeling Using MODELLER
B. Webb (2014)
Automated structural annotation of the malaria proteome and identification of candidate proteins for modelling and crystallization studies
Yolandi Joubert (2008)
10.1093/nar/gki233
Homology-extended sequence alignment
V. Simossis (2005)
10.1093/bioinformatics/bti109
An alternative model of amino acid replacement
G. Crooks (2005)
Paralelização da ferramenta de alinhamento de sequências MUSCLE para um ambiente distribuído
Evandro A. Marucci (2009)
10.1093/PROTEIN/GZJ005
Variable gap penalty for protein sequence-structure alignment.
M. S. Madhusudhan (2006)
The Identification and Characterisation of Microbes in Complex Environments
Alexander L. B. Leach (2013)
Special Issue Papers The many faces of sequence
S. Batzoglou (2005)
10.1186/s12859-015-0749-z
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment
Erik Wright (2015)
10.1093/NAR/GKH340
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
R. Edgar (2004)
10.4172/2153-0602.1000152
Units: Universal True SDSA (Structure-Dependent Sequence Alignment)
Scott G. Foy (2014)
10.1002/0471250953.bi0506s15
Comparative Protein Structure Modeling Using Modeller
N. Eswar (2006)
Comparison of Methods Used for Aligning Protein Sequences
Sangeetha Madangopal (2006)
10.1093/bioinformatics/btv354
UniAlign: protein structure alignment meets evolution
Chunyu Zhao (2015)
Machine Learning for Protein Function
Dan Ofer (2016)
10.1093/bioinformatics/btp048
SIMPRO: simple protein homology detection method by using indirect signals
I. Jung (2009)
Examining Phylogenetic Reconstruction Algorithms
E. Albright (2014)
10.1186/1471-2105-12-217
Conotoxin protein classification using free scores of words and support vector machines
Nazar Zaki (2010)
sgRNA Design for Editing rs of a Gene Family Using tem on Lahav
Gal Hyams (2018)
10.1093/bioinformatics/btl297
Incremental window-based protein sequence alignment algorithms
H. Rangwala (2007)
From Sequence to Structure And Back Again: An Alignment Tale
V. Simossis (2005)
10.1186/1756-0500-4-296
Protein sequence alignment with family-specific amino acid similarity matrices
I. B. Kuznetsov (2011)
10.1093/BIB/6.1.6
The many faces of sequence alignment
S. Batzoglou (2005)
10.1109/TNB.2008.2000747
DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles
P.D. Yoo (2008)
10.15480/882.1184
Mapping dynamic programming algorithms on graphics processing units
M. Hanif (2014)
Calculating the structure-based phylogenetic relationship of distantly related homologous proteins utilizing maximum likelihood structural alignment combinatorics and a novel structural molecular clock hypothesis
Scott G. Foy (2013)
Octopus skin ‘sight’ and the evolution of dispersed, dermal light sensing in Mollusca
M. D. Ramirez (2017)
10.1186/1471-2105-5-113
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
R. Edgar (2004)
10.1016/B978-0-12-409547-2.11133-3
Comparative Modeling of Drug Target Proteins☆
B. Webb (2014)
10.1071/SB06020
Multiple sequence alignment for phylogenetic purposes
D. Morrison (2006)
10.1142/9781860948732_0032
fRMSDPred: predicting local RMSD between structural fragments using sequence information.
H. Rangwala (2007)
See more
Semantic Scholar Logo Some data provided by SemanticScholar