Increasing effort is being invested in attempting to squeeze every ounce of performance from the currently available processors; much of this investment is going into minimizing the effect of very slow memory access times on overall computational speed. This requires that each piece of data brought into a cache should be used as much as possible (temporal locality), and that every piece of data brought into a cache should be utilized (spatial locality).
This paper looks at the practical advantages of using two simple methods of increasing temporal and spatial locality on the implementation of a sparse matrix/dense vector multiply. Performance comparisons are made against other commonly used schemes using a variety of different sparsity patterns. The storage method showing the most advantage for a general unstructured sparse matrix consists of an array of structures, where each structure consists of a pair of integers giving the row and column indexes and a float, providing the value of the nonzero element.
The work reported is obviously very much a work-in-progress and is restricted to a very simple computational use (sparse matrices where there is no requirement to access the elements in any particular order or to select sets of elements, for example, all elements in a particular row). The paper would be a good, gentle introduction for a student wishing to start working in this area.