Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
The art of high performance computing for computational science, vol. 1 : techniques of speedup and parallelization for general purposes
Geshi M., Springer International Publishing, New York, NY, 2019. 219 pp. Type: Book
Date Reviewed: May 12 2020

This is the book I wish I had owned when I started supporting computational scientists a few years ago. It is revised and updated from a Japanese volume based on computational science lectures broadcast to a number of campuses through video conference systems. Each chapter includes diagrams, code segments, exercises, and references.

The first chapter (“High-Performance Computer Basics”) outlines some important characteristics of current computer architectures; among these are pipelining, hierarchical memory schemes, and nonuniform memory access (NUMA) arrangements. It notes that a feature of recent architectures is a high degree of parallelism.

The benefits of accessing data elements in the order they are stored are explained, and a simple Fortran matrix multiplication program is presented to illustrate the concept of loop unrolling. I was able to verify the efficiency improvement that can be achieved with the unrolled-code version of this program. I also observed that an even better level of efficiency resulted when I allowed my compiler to do the unrolling. Other concepts illustrated in this chapter include cache blocking, loop transformations, and removal of IF-sentences inside loops.

Many scientific programs can be executed more efficiently by using different processing elements working in parallel. This can be managed by using the message passing interface (MPI). The basics are covered in chapter 2. An MPI program is included to illustrate the mechanisms involved. MPI programs are especially useful where a number of cluster nodes (each with its own processing elements and memory structures) can be harnessed to run a program.

For thread parallelization with inside-node parallelism, one can use OpenMP programming. Chapter 3 includes some Fortran OpenMP program segments to illustrate the OpenMP directives that can be used. It is noted that recent extensions to the OpenMP specification enable later versions of OpenMPI to be used with general-purpose graphics processing unit (GPGPU) devices.

In chapter 4 (“Hybrid Parallelization Techniques”), the author observes that there are occasions where MPI can be used for communication between nodes on which OpenMP or GPU threads are executing. The subsequent chapter illustrates some performance-tuning procedures; for some of these, it is suggested that nonblocking MPI constructs are incorporated.

The Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) libraries are widely used in research environments for matrix and linear equation solution operations. In chapter 6, readers can see how these can be installed on Ubuntu 16.04 and similar systems and used in some example Octave and C++ programs. The efficiency gains that can be realized through using OpenBLAS and the importance of appropriate storage-order formats are explored. A brief discussion of Automatically Tuned Linear Algebra Software (ATLAS) is included.

Chapter 7 predicts that supercomputers, in 2020, may have almost a billion cores and a very deep memory hierarchy. In consequence, we can expect that off-chip data transfers will consume significant amounts of power, and this, combined with the massive increase in component count, will result in an alarming increase in failure rates. Some linear algebra algorithms that may be able to accommodate this situation are discussed.

The fast Fourier transform (FFT) is widely used in spectrum analysis, image processing, and communications. Chapter 8 presents some algorithms for its computation on large-scale systems. Libraries like the Fastest Fourier Transform in the West (FFTW) are able to perform automatic tuning in a distributed memory parallel computer, and a CUDA Fortran implementation of a parallel one-dimensional FFT (using an MPI library) is shown.

The author of chapter 9 (“Optimization and Related Topics”) observes that one’s time may be better spent debugging a program than improving its level of optimization. He outlines a unit testing scheme where the results of a program without optimization are compared with the results of an optimized version, using the same data. A similar debugging process can be undertaken using a version control system. The use of profilers is also discussed.

The final chapter discusses some techniques relating to computational accuracy. It is shown that bounds on the accuracy of a result can be computed by repeating its computation with adjustments to the arithmetic rounding modes used for floating-point numbers encoded in accordance with the IEEE 754 Standard. Binary128, quadruple-precision, and double-double-precision number formats are introduced, and the GMP, MPFR, and MPACK libraries are discussed.

The range of topics included in this book is such that the level of coverage for each topic must be brief. I was forced to look on the web for additional usage examples and compilation instructions. However, I learned a lot. I can recommend the book to anyone who uses or has to support high-performance computing facilities in their work.

Reviewer:  G. K. Jenkins Review #: CR146966 (2010-0232)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Performance of Systems (C.4 )
 
 
General (G.0 )
 
 
General (I.0 )
 
 
Mathematical Logic And Formal Languages (F.4 )
 
 
Physical Sciences And Engineering (J.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance of Systems": Date
A computer and communications network performance analysis primer
Stuck B., Arthurs E., Prentice-Hall, Inc., Upper Saddle River, NJ, 1985. Type: Book (9789780131639812)
Jun 1 1985
A mean value performance model for locking in databases
Tay Y., Suri R. (ed), Goodman N. Journal of the ACM 32(3): 618-651, 1985. Type: Article
Mar 1 1986
The relationship between benchmark tests and microcomputer price
Sircar S., Dave D. Communications of the ACM 29(3): 212-217, 1986. Type: Article
Nov 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy