Non-uniform memory access (NUMA) machines offer difficult memory performance challenges. Simulation provides one means of measuring and optimizing the performance of NUMA architectures. This work describes the use of the SIMT simulation tool to improve cache misses, cache invalidations, and page placement and movement strategies.
Based on the Augmint multiprocessor simulator toolkit for the Intel x86 architecture, SIMT is designed for measuring memory performance, and models caches, distributed shared memory, and data transfer between processors. By presenting detailed data about cache misses, cache invalidations, and remote and local memory accesses, SIMT provides information that can be used to improve performance by changing program memory organization and access patterns.
Cache and memory parameters that can be specified and varied with SIMT include number, organization, and size; cache coherency protocol; and local and remote memory access latencies. Five cache coherency protocols, seven data allocation policies, and three data migration policies can be modeled.
The use of SIMT is illustrated using three programs from the standard SPLASH shared memory benchmark suite. These examples demonstrate how to improve cache misses, how to find an optimal cache coherency policy, and how to improve initial data placement and dynamic data migration.
The authors provide no verification of their simulation results with actual measurements from existing systems, but do reference other work that they claim shows the accuracy of the simulator. The work is well organized and clearly presented, and should be of interest to those who work with NUMA machines and architectures, especially those interested in performance.