Checkpointing

Introduction

There are several reasons why a program may not run to completion. Checkpointing is an important safeguarding measure for programmers who cannot afford to lose data from a partial run. In this module, we introduce the basic concepts of checkpointing and give some suggestions as to how to begin implementation.

Prerequisites:

The examples assume you know how to run a batch job as shown in the Cluster CoNTroller  module.

Susan Mehringer
Susan Mehringer