As of October 2005, the complete sequences of more than 250 microbial
and eukaryote genomes are available, including the human genome.
In addition, more than 1000 other genome sequencing projects are
The public nucleotide databases EMBL,
currently holds over 100 000 000 000 base pairs of DNA sequence
information, and they are growing rapidly. The graph below shows
the growth of GenBank over the last 20 years. Note the logarithmic
Since the amount of sequence data is so large, computer tools are
essential in the genetic analysis, and bioinformatics is becoming
the bottleneck in the process. The amount of data is now growing
faster than the improvement in computer technology, which is also
Comparing two sequences and searching a database to find similar
sequences are fundamental operations on the path to the understanding
of a gene. A similarity search can be used to identify potential
homologs of a given protein. It can also be used to predict the
function of a gene, or in the modelling of a protein structure.
Other uses are found in diagnostics and in the identification of
potential targets in drug development.
The tools for sequence database searching are some of the most
frequently used in bioinformatics. However, they require large computational
resources, especially when the most accurate methods are needed.
Ever since it was first published in 1981, the Smith Waterman algorithm
has been considered the optimal method (gold standard) for homology
searches and sequence alignment in gene databases .
However, the long computation times required to complete the pairwise
alignments strongly limited its use for screening the large sequence
databases that have evolved recently.
Instead, heuristic methods have been developed that provide faster
but less rigorous screening, i.e. significant matches will are missed
by the searches. Sencel Bioinformatics has now developed Smith-Waterman
implementations that run nearly 8 times faster on ordinary hardware
than the original implementation. This means that full Smith-Waterman
searches can now be completed within reasonable time .
In addition, the ParAlign modification accelerates searches nearly
30 times faster with only a negligible sacrifice of sensitivity
. The speed has been
achieved using multimedia technology
embedded in modern computers. Representing a breakthrough in performing
the most sensitive and accurate sequence database searches, the
software is now available at a reasonable price from Sencel Bioinformatics.