Sencel's technology

As of October 2005, the complete sequences of more than 250 microbial and eukaryote genomes are available, including the human genome. In addition, more than 1000 other genome sequencing projects are underway.

The public nucleotide databases EMBL, GenBank and DDBJ currently holds over 100 000 000 000 base pairs of DNA sequence information, and they are growing rapidly. The graph below shows the growth of GenBank over the last 20 years. Note the logarithmic scale.

Since the amount of sequence data is so large, computer tools are essential in the genetic analysis, and bioinformatics is becoming the bottleneck in the process. The amount of data is now growing faster than the improvement in computer technology, which is also growing exponentially.

Comparing two sequences and searching a database to find similar sequences are fundamental operations on the path to the understanding of a gene. A similarity search can be used to identify potential homologs of a given protein. It can also be used to predict the function of a gene, or in the modelling of a protein structure. Other uses are found in diagnostics and in the identification of potential targets in drug development.

The tools for sequence database searching are some of the most frequently used in bioinformatics. However, they require large computational resources, especially when the most accurate methods are needed.

Ever since it was first published in 1981, the Smith Waterman algorithm has been considered the optimal method (gold standard) for homology searches and sequence alignment in gene databases [1]. However, the long computation times required to complete the pairwise alignments strongly limited its use for screening the large sequence databases that have evolved recently.

Instead, heuristic methods have been developed that provide faster but less rigorous screening, i.e. significant matches will are missed by the searches. Sencel Bioinformatics has now developed Smith-Waterman implementations that run nearly 8 times faster on ordinary hardware than the original implementation. This means that full Smith-Waterman searches can now be completed within reasonable time [3].

In addition, the ParAlign modification accelerates searches nearly 30 times faster with only a negligible sacrifice of sensitivity [4]. The speed has been achieved using multimedia technology embedded in modern computers. Representing a breakthrough in performing the most sensitive and accurate sequence database searches, the software is now available at a reasonable price from Sencel Bioinformatics.