Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Efficient hardware checkpointing: concepts, overhead analysis, and implementation
Koch D., Haubelt C., Teich J.  Field programmable gate arrays (Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, Monterey, California, Feb 18-20, 2007)188-196.2007.Type:Proceedings
Date Reviewed: Apr 11 2007

Koch and Haubelt present a method, together with its application, for realizing reliable system-on-a-chip (SoC) devices implemented in field programmable gate array (FPGA) platforms by exploiting software checkpointing.

Traditionally, this mechanism has been adopted with software to deal with the occurrence of errors: a previously stored correct situation is returned to, and computation is continued. The authors present a framework for using this same concept for hardware, to cope with the occurrence of soft errors, such as a single-event upset fault in an SoC implemented on an FPGA.

The paper presents the entire framework, from its motivations to its implementation. This is quite a success for a conference paper, where space is limited. The authors provide an overview of the system (and its conceptual foundation), together with the most important details of the various aspects characterizing the method.

The paper begins by focusing on the introduction of the checkpoint concept and its “porting” to the hardware context, also explaining how, from the functional point of view, correct states are saved and eventually rolled back to. Given the available space, not all aspects are discussed in detail (such as the discussion about where the checkpoints are stored, which is only considered at the network level), yet the reader can get a good idea of the approach.

The second part of the paper is devoted to the implementation mechanisms supporting checkpointing, that is, to saving and restoring, and what hardware solutions may be adopted. In order to implement devices with such a fault mitigation mechanism, the modified design flow is also presented.

The paper concludes with a presentation of the experimental results and a comparison with other approaches. An estimation of only the costs of the other approaches is discussed (the readback technique), which does not allow the reader to actually compare the benefits of the proposed work. Nevertheless, the proposal seems to be interesting and the paper is surely worth reading.

Reviewer:  C. Bolchini Review #: CR134133 (0804-0354)
Bookmark and Share
 
Reliability And Testing (B.5.3 )
 
Would you recommend this review?
yes
no

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy