Computing Reviews

Reprint of "A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications"
Flick P., Jain C., Pan T., Aluru S. Parallel Computing70 54-65,2017.Type:Article
Date Reviewed: 04/13/18

Ed. note: This paper is a reprint, included in a special issue of Parallel Computing. It was identified by the ACM Artifact Review and Badging Initiative, and received the “Results Replicated” badge. The other papers in this special issue were written by teams that replicated the results presented in this reprinted paper.

--

Increasingly large datasets are becoming common in bioinformatics. The particular area of focus in this paper is a metagenomics set extracted from possibly thousands of species in a cubic foot of soil. Starting with sequence overlap details captured in a de Bruijn graph, the goal is the separation into connected components for further species-specific study. First, the de Bruijn graph is converted to an undirected graph, which is then partitioned into connected components using the proposed algorithm, which relies heavily on parallel distributed sorting.

The approach presented proves quite effective for the metagenomics graph structure composed of a lot of relatively small disconnected components, but less so on a comparison graph studied with one large and many small components. Mapping results back to the de Bruijn graph is inherent in the algorithm so, importantly, further processing at the species-specific level is decoupled from the graph processing.

Algorithm details are carefully developed and neatly presented starting with the sequential version described using helpful pseudocode and related theory. The parallel implementation using a message-passing approach is then described. Experiments for three parallel variants of the algorithm are effectively presented in tables and plots, which clearly demonstrate the communication and computational aspects and that communication reduction through load balancing is key for high efficiency. The impressive solution of the 1.8 billion reads metagenomics soil problem in 22 minutes on 1,280 cores demonstrates a highly effective, scalable tool for this domain and a substantial improvement over previous approaches.

Reviewer:  M. Benson Review #: CR145974 (1806-0325)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy