Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoch System #459

Merged
merged 63 commits into from
Sep 18, 2023
Merged

Epoch System #459

merged 63 commits into from
Sep 18, 2023

Conversation

matthew-levan
Copy link
Contributor

@matthew-levan matthew-levan commented Jun 15, 2023

This PR implements a new format for how piers store their event logs on disk.

Resolves #313.

Design

Existing format:

./zod/.urb/log
├── data.mdb
└── lock.mdb

New format:

./zod/.urb/log
├── 0i0             # epoch dirnames specify the last event of the previous epoch
│   ├── data.mdb    # lmdb file containing events 1-132
│   ├── epoc.txt    # disk format version (this PR starts versioning at 1)
│   ├── lock.mdb    # lmdb lock file
│   └── vere.txt    # binary version this set of events was originally run with
└── 0i132
    ├── data.mdb
    ├── epoc.txt
    ├── lock.mdb
    ├── north.bin   #
    ├── south.bin   # snapshot files (state as of event 132), strictly read-only
    └── vere.txt

The new format introduces epochs, which are simply "slices" or "chunks" of a ship's complete event log. Above, you can see the ship's event log chunked into two epochs: 0i0 and 0i132.

New ships booted with the code in this PR instantiate their log directories with the new format. Existing piers are automatically migrated on boot.

Epoch "rollovers" (when the current epoch is ended and a new, empty epoch is created) occur under three conditions:

  1. The pilot uses the new roll subcommand to manually rollover.
  2. The pilot runs the chop subcommand.
  3. We detect a different running binary version than the one pinned in the current epoch.

Both migrations and epoch rollovers ensure there's a current snapshot before running.

A few TODOs left:

  • Iron out small kink in migration behavior for previously chopped piers
  • Make sure correct binary version gets pinned to first epoch of migrated piers
  • Rollover to new epoch when a new binary version is detected
  • Make sure manual migration logic is idempotent
  • Update prep command
  • Fix chop so it works when there are 3 epochs starting with 0i0
  • Reproduce and fix partially-deleted epoch 0i0 after chop
  • Pair with someone to run manual GDB testing for migration idempotency and rollover logic
  • Take a look at @joemfb's replay code and compare/find overlaps
  • Document final system design in this PR
  • Correct epoch naming scheme
  • Make chop leave the latest two epochs
  • Better error handling
  • Better cleanup
  • Test migration with real ships running on local-networking mode
  • Test epoch rollover idempotency
  • Test fresh boot
  • Handle case where snapshot has been deleted from chk/
  • Ensure u3_disk_epoc_good() is implemented and used how we want
  • Ensure u3_disk_epoc_init() is implemented and used how we want
  • Replay works with urbit play and urbit
  • Replay works in edge case where only epoch 0 and no valid snapshot exist
  • Move new-epoch-on-vere-version-mismatch logic to _pier_wyrd_init()
  • Make subcommands which call u3_disk_init() auto-migrate
    • info
    • cram
    • queu
    • meld
    • pack
    • play
    • chop
    • roll
  • Make replay on boot use u3_mars_play()
  • Test migration from an old pier (again)
  • Test migration from an old pier that needs a full replay (i.e., from beginning of its event log) first works
  • Test that ./urbit roll zod with an updated binary version and an empty latest epoch, it does not roll but instead just updates the vere.txt file

@matthew-levan matthew-levan requested review from barter-simsum and removed request for barter-simsum July 13, 2023 02:08
Copy link
Member

@barter-simsum barter-simsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick pass over. a few nits mostly.

pkg/noun/events.c Show resolved Hide resolved
pkg/vere/disk.c Outdated Show resolved Hide resolved
pkg/vere/disk.c Show resolved Hide resolved
@matthew-levan
Copy link
Contributor Author

This is ready for final review and merge. @joemfb @belisarius222 @barter-simsum

@matthew-levan
Copy link
Contributor Author

matthew-levan commented Jul 18, 2023 via email

barter-simsum
barter-simsum previously approved these changes Aug 10, 2023
Copy link
Member

@barter-simsum barter-simsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PMTS,N

@barter-simsum
Copy link
Member

@joemfb to reapprove

@matthew-levan
Copy link
Contributor Author

Ravioli, ravioli, give me the formuoli!

@matthew-levan matthew-levan changed the title epoch system Epoch System Sep 6, 2023
Copy link
Member

@joemfb joemfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will do

}

return c3n;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a trailing newline in this file

@pkova pkova merged commit 7ba4bca into develop Sep 18, 2023
5 checks passed
@pkova pkova deleted the msl/replay-with-epochs branch September 18, 2023 15:39
@joemfb joemfb mentioned this pull request Oct 4, 2023
pkova added a commit that referenced this pull request Oct 11, 2023
This PR cleans up the new epoch system from #459, fixing some small
bugs, plugging leaks, and simplifying the interface to it.

It still needs a final round of crash recovery testing (killing the
process at every stage of the intialization/migration, confirming that
subsequent restarts proceed as they should).

Resolves #530.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or feature request io Related to the IO drivers
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

pier: epoch system
6 participants