Implement crash recovery mechanism #107

hidmic · 2024-09-29T14:48:50Z

Feature description

Benchmarks with many parameters running against large datasets will invariably take a long time. Right now, if the benchmark is interrupted or crashes, there is no way to resume from where it was. We have to start over or make do with whatever we got before the crash. For a benchmark that needs 3 days to run, this is incredibly wasteful.

We need a mechanism to recover from crashes like this. Grabbing core dumps and system logs would also be useful for post-mortem analysis.

Implementation considerations

Perhaps we can take some ideas from filesystem journaling, logging benchmark runs' state.

hidmic added the enhancement New feature or request label Sep 29, 2024

hidmic mentioned this issue Oct 24, 2024

Continue on failure by default #112

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement crash recovery mechanism #107

Implement crash recovery mechanism #107

hidmic commented Sep 29, 2024

Implement crash recovery mechanism #107

Implement crash recovery mechanism #107

Comments

hidmic commented Sep 29, 2024

Feature description

Implementation considerations