Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Reprint of "A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications"
Flick P., Jain C., Pan T., Aluru S. Parallel Computing70  54-65,2017.Type:Article
Date Reviewed: Apr 13 2018

Ed. note: This paper is a reprint, included in a special issue of Parallel Computing. It was identified by the ACM Artifact Review and Badging Initiative, and received the “Results Replicated” badge. The other papers in this special issue were written by teams that replicated the results presented in this reprinted paper.

--

Increasingly large datasets are becoming common in bioinformatics. The particular area of focus in this paper is a metagenomics set extracted from possibly thousands of species in a cubic foot of soil. Starting with sequence overlap details captured in a de Bruijn graph, the goal is the separation into connected components for further species-specific study. First, the de Bruijn graph is converted to an undirected graph, which is then partitioned into connected components using the proposed algorithm, which relies heavily on parallel distributed sorting.

The approach presented proves quite effective for the metagenomics graph structure composed of a lot of relatively small disconnected components, but less so on a comparison graph studied with one large and many small components. Mapping results back to the de Bruijn graph is inherent in the algorithm so, importantly, further processing at the species-specific level is decoupled from the graph processing.

Algorithm details are carefully developed and neatly presented starting with the sequential version described using helpful pseudocode and related theory. The parallel implementation using a message-passing approach is then described. Experiments for three parallel variants of the algorithm are effectively presented in tables and plots, which clearly demonstrate the communication and computational aspects and that communication reduction through load balancing is key for high efficiency. The impressive solution of the 1.8 billion reads metagenomics soil problem in 22 minutes on 1,280 cores demonstrates a highly effective, scalable tool for this domain and a substantial improvement over previous approaches.

Reviewer:  M. Benson Review #: CR145974 (1806-0325)
Bookmark and Share
 
Parallel Algorithms (G.1.0 ... )
 
 
Life And Medical Sciences (J.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel Algorithms": Date
Parallel algorithms in computational science
Heermann D., Burkitt A., Springer-Verlag New York, Inc., New York, NY, 1991. Type: Book (9780387534183)
Apr 1 1992
A parallel shortest augmenting path algorithm for the assignment problem
Balas E., Miller D., Pekny J., Toth P. (ed) Journal of the ACM 38(4): 985-1004, 1991. Type: Article
Sep 1 1992
An o(n log n) minimal spanning tree algorithmn for n points in the plane
Changm R., Lee R. BIT 26(1): 7-16, 1986. Type: Article
Nov 1 1987
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy