← Back to Search
Improved Sensitivity Of Biological Sequence Database Searches
Published 1990 · Computer Science, Medicine
We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. A matching matrix species which k-tuples match each other. The matching matrix can be calculated from a similarity matrix of amino acids and a threshold of similarity required for matching. This permits amino acid similarity matrices or replacement matrices (PAM matrices) to be used in the first step of a sequence comparison rather than in a secondary scoring phase. The concept of matching non-identical k-tuples also increases the power of DNA database searches. For example, a matrix that specifies that any 3-tuple in a DNA sequence can match any other 3-tuple encoding the same amino acid permits a DNA database search using a DNA query sequence for regions that would encode a similar amino acid sequence.