Message passing interface (MPI) is the de facto standard for message passing programming. MPI supports point-to-point messaging, collective interchanges, and multi-point communication management. One fundamental feature that the multi-point communication manager provides is “a context in which a group of processes can exchange messages”; this context is known as the “communicator.”
MPI_Comm_split is one of the routines available to carve a new communicator out of the available pool of processors. The authors observe that this routine does not scale well for execution time or memory consumption on either of the most popular open-source implementations of MPI available in 2011: MPICH and OpenMPI.
They move on to propose three techniques to improve scalability: replacing the sorting with merging, sorting in parallel, and using a distributed table instead of a replicated one. Besides the complexity analysis of their solution, the authors also provide very preliminary simulation results on potential savings for each improvement and for multiple scenarios in terms of system and communicator sizes. Their results focus on central processing unit (CPU) consumption only, and do not account for network latencies or contention.
This paper is a good one among the current trend of creative solutions to overcoming the sequential bottleneck we observe in many of our current algorithms and data structures.