Homology is a process of detecting similarity between two strands of proteins or genes to establish common ancestry. Sequence similarity searching is a computationally intensive process because the protein sequence databases are very comprehensive and a large amount of samples share significant similarity with proteins in sequence databases. Some of the popular algorithms/tools used for homology detection are BLAST, SSEARCH (Smith-Waterman), FASTA, and HMMER. BLAST, FASTA, and SSEARCH do pairwise sequence alignments where the protein structure is viewed from a single perspective, whereas HMMER, based on hidden Markov models, searches with models of protein families and can identify far more homologs at little additional computation cost.
This paper describes the HMMER3 algorithm and its implementation on field-programmable gate arrays (FPGAs) in detail. The original algorithm has a feedback loop that limits the inherent parallelism. The authors have presented a novel design to overcome those limitations to expose parallelism, which is then exploited in an FPGA-based design. The use of high-level synthesis (HLS) tools in the design space exploration is also encouraging because it makes FPGA platforms more accessible for the community. Given that FPGAs are slowly finding their way into data centers, it is just a matter of time until they become available as a platform-as-a-service (PaaS) offering in the cloud for such computationally intensive tasks at scale.