22
Introduction to Sequence Analysis
Nucleotide Analysis
BLAST algorithm
¥BLAST stands for Basic Local Alignment Search Tool. It is used to compare a novel sequence with those contained in nucleotide and protein databases. The emphasis of this tool is to find regions of sequence similarity. These can yield clues about the structure and function of this novel sequence, and about its evolutionary history and homology with other sequences in the database.
¥
¥The fundamental unit of the BLAST algorithm output is the High-scoring Segment Pair (HSP). An HSP consists of two sequence fragments of arbitrary but equal length whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score. A set of HSPs is defined by two sequences, a scoring system, and a cutoff score; this set may be empty if the cutoff score is sufficiently high. In the programmatic implementations of the BLAST algorithm described here, each HSP consists of a segment from the query sequence and one from a database sequence.
¥
¥The approach to similarity searching taken by the BLAST programs is first to look for similar segments (HSPs) between the query sequence and a database sequence, then to evaluate the statistical significance of any matches that were found, and finally to report only those matches that satisfy a user-selectable threshold of significance.
¥
¥There are 2 main versions of BLAST available at the EBI, namely WU-BLAST2 and NCBI-BLAST2. These are distinctly different software packages, although they have a common lineage for some portions of their code, so the two packages do their work differently and obtain different results and offer different features.