Optimistically synchronized parallel discrete-event simulation is based on the use of communicating sequential processes. When synchronization errors are detected, processes are rolled back to an earlier state that has been checkpointed. The performance of the system depends on various factors, including the checkpoint interval, the checkpoint time, the scheduling algorithm for the processes, and the cancellation policy in error situations.
This paper presents empirical results describing the effects of these factors on the time and space requirements of parallel simulation. At first sight, the general results (if systems exhibit few secondary rollbacks, use longer checkpoint intervals) are not very impressive. A more thorough examination of the detailed descriptions of the system performance under varying circumstances gives the reader good insight into the complex behavior of parallel systems, however. The methodological content of the paper should also be useful to anybody working with rollbacks due to synchronization problems.
The paper is rather long, and some details could certainly have been left out. On the other hand, the casual reader would have benefitted from an even more detailed analysis of the causal relations between the architectural factors and the system performance. A topic for further study is the sensitivity of the results to process cooperation patterns.