> From: Ross Boylan <ross at biostat.ucsf.edu>
>
> In answer to the other question about using OS checkpointing
> facilities, I haven't tried them since the application will be running
> on a cluster. More precisely, the optimization will be driven from a
> single machine, but the calculation of the objective function will be
> distributed. So checkpointing at the level of the optimization
> function is a good fit to my needs. There are some cluster OS's that
> provide a kind of unified process space across the processors (scyld,
> mosix), but we're not using them and checkpointing them is an unsolved
> problem. At least, it was unsolved a couple of years ago when I
> looked into it.
>
A few years ago, Condor, yet another job queuing tool, had some
checkpointing features. Jun Yan had a presentation on his WWW site at
that time about it (but not necessarily about testing the
checkpointing feature).
I'd think that checkpointing would be best in system-space, not
user-space; however, for optimization, it should be just a matter of
saving state and possibly history, if you are doing memoization.
best,
-tony
blindglobe at gmail.com
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we can
easily
roll-back your mistakes" (AJR, 4Jan05).