Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCC] Add async scheduler (experiment) #366

Draft
wants to merge 65 commits into
base: occ-main
Choose a base branch
from

Conversation

stevenlanders
Copy link
Contributor

@stevenlanders stevenlanders commented Nov 28, 2023

Describe your changes and provide context

  • This keeps the old version as ProcessAllSync and replaces ProcessAll with this async version
  • It's a good bit slower than the existing version (way too many re-validates due to fast retry)
  • Every so often (e.g., 1/10000) it seems to hang, still figuring that out

Testing performed to validate your change

  • Scheduler Test and DeliverTxTest applies
  • Queue unit tests

udpatil and others added 25 commits October 17, 2023 14:01
## Describe your changes and provide context
This adds some comments with some useful code pointers for existing
logic and discussing future OCC work

## Testing performed to validate your change
NA
## Describe your changes and provide context
Add multiversion store data structures file, and implement the
multiversioned item

## Testing performed to validate your change
Added unit tests to verify behavior
## Describe your changes and provide context
This adds the incarnation field to the multiversion item data structure.

## Testing performed to validate your change
updated unit tests
## Describe your changes and provide context
This implements the multiversion with basic functionality, but still
needs additional work to implement the iterator functionality and/or
persisting readsets for validation

## Testing performed to validate your change
Added unit tests for basic multiversion store
## Describe your changes and provide context
- `ConcurrencyWorkers` represents the number of workers to use for
concurrent transactions
- since concurrrency-workers is a baseapp-level setting, implementations
(like sei-chain) shouldn't have to pass it (but can)
- it defaults to 10 if not set (via cli default value)
- it defaults to 10 in app.toml only if that file is being created (and
doesn't exist)
- if explicitly set to zero on command line, it will override with the
default (for safety)
- cli takes precedence over the config file
- no one has to do anything to get it to be 10 (no config changes no
sei-chain changes required (aside from new cosmos version))

## Testing performed to validate your change
- Unit Tests for setting the value
- Manually testing scenarios with sei-chain
## Describe your changes and provide context
This adds in functionality to write the latest multiversion values to
another store (to be used for writing to parent after transaction
execution), and also adds in helpers for writeset management such as
setting, invalidating, and setting estimated writesets.

## Testing performed to validate your change
Unit testing for added functionality
## Describe your changes and provide context
- `sei-cosmos` will receive a list of transactions, so that sei-chain
does not need to hold the logic for OCC
- This will make the logic easier to test, as sei-cosmos will be fairly
self-contained
- Types can be extended within a tx and within request/response

Example interaction:
<img
src="https://github.com/sei-protocol/sei-cosmos/assets/6051744/58c9a263-7bc6-4ede-83ab-5e34794510b1"
width=50% height=50%>

## Testing performed to validate your change
- This is a skeleton for a batch interface
## Describe your changes and provide context
This implements an mvkv store that will manage access from a transaction
execution to the underlying multiversion store and underlying parent
store if the multiversion store doesn't have that key. It will first
serve any reads from its own writeset and readset, but if it does have
to fall through to multiversion store or parent store, it will add those
values to the readset.

## Testing performed to validate your change
Unit tests
…ore (#330)

## Describe your changes and provide context
This adds in validation for transaction state to multiversion store, and
implements readset validation for it as well.

## Testing performed to validate your change
Unit Test
## Describe your changes and provide context
- Adds a basic scheduler shell (see TODOs)
- Adds a basic task definition with request/response/index
- Listens to abort channel after an execution to determine conflict

## Testing performed to validate your change
- Compiles (holding off until shape is validated)
- Basic Unit Test for ProcessAll
## Describe your changes and provide context
This implements Iterator and ReverseIterator for mvkv for the KVStore
interface. The memiterator will be composed of versionindexedstore and
multiversionstore, and will yield values in a cascading fashion firstly
from the writeset, and then second from the multiversion store.

This still needs optimization to persisted sorted keys instead of
reconstructing sorted keys each time.

## Testing performed to validate your change
Unit test to verify basic functionality
## Describe your changes and provide context
This fixes a dependency that was refactored, and enables commit push CI
for occ-main

## Testing performed to validate your change
CI
## Describe your changes and provide context
This implements a tracked iterator that is used to keep track of keys
that have been iterated, and to also save metadata about the iteration
for LATER validation. The iterator will be replayed and if there are any
new keys / any keys missing within the iteration range, it will fail
validation. the actual values served by the iterator are covered by
readset validation.

Additionally, the early stop behavior allows the iterateset to ONLY be
sensitive to changes to the keys available WITHIN the iteration range.
In the event that we perform iteration, and THEN write a key within the
range of iteration, this will not fail iteration because we take a
snapshot of the mvkv writeset at the moment of iteration, so when we
replay the iterator, we populate that iterator with the writeset at that
time, so we appropriately replicate the iterator behavior.

In the case that we encounter an ESTIMATE, we have to terminate the
iterator validation and mark it as failed because it is impossible to
know whether that ESTIMATE represents a value change or a delete, since
the latter, will affect the keys available for iteration.

This change also implements handlers that iterators receive for updating
readset and iterateset in the `mvkv`

## Testing performed to validate your change
Unit tests for various iteration scenarios
## Describe your changes and provide context
- This was copied from #332 which became unwieldy due to commit history
(merges/rebases)
- Adds scheduler logic for validation
- In this initial version it completes all executions then performs
validations (which feed retries)
- Once we start benchmarking we can make performance improvements to
this
- Retries tasks that fail validation and have no dependencies

## Testing performed to validate your change
- Scheduler Test verifies multi-worker with conflicts
## Describe your changes and provide context
Some tests from sei-chain don't inject a store, and while I'm not sure
if that's a valid scenario I made the scheduler.go tolerant to the
situation to avoid introducing this assumption to the system.

## Testing performed to validate your change
New unit test confirming lack of crash
## Describe your changes and provide context
- Allows sei-chain to ask isOCCEnabled() so that it can choose to use
the OCC logic
- Sei-chain can set this to true according to desired logic

## Testing performed to validate your change
- unit test that sets flag and verifies value
## Describe your changes and provide context
This adds in the ability to prefill estimates based on metadata passed
along with deliverTxBatch

## Testing performed to validate your change
Unit Test to verify that multiversion store initialization is now
idempotent, and works properly regardless of whether estimate prefill is
enabled
## Describe your changes and provide context
- `CollectIteratorItems` needs to hold an RLock to avoid a concurrent
access panic

## Testing performed to validate your change
- Reproduced through a sei-chain-side test (concurrent instantiates)
## Describe your changes and provide context
This adds the accesscontrol module behavior to add the tx writeset
generation

## Testing performed to validate your change
Unit tests + integration with sei-chain and loadtest cluster testing
## Describe your changes and provide context
- Adds trace span for `SchedulerValidate` 
- Adds trace span for `SchedulerExecute`
- Mild refactor (extracted methods) to make it easier to defer span
ending

## Testing performed to validate your change
Example trace (run locally)

![image](https://github.com/sei-protocol/sei-cosmos/assets/6051744/b8a032f1-71b1-4e95-b12e-357455ebcc6d)

Example attributes of SchedulerExecute operation

![image](https://github.com/sei-protocol/sei-cosmos/assets/6051744/68992e84-4000-44c1-8597-9d4c10583a66)
## Describe your changes and provide context
This fixes the validation to remove a panic for a case that can actually
occur if a transaction writes a key that is later read, and that writing
transaction is reverted and then the readset validation reads from
parent store. In this case, the readset would have a conflict based on
the data available in parent store, so we shouldn't panic. This also
adds in the resource types needed for the new DEX_MEM keys

## Testing performed to validate your change
Tested in loadtest cluster
## Describe your changes and provide context
This makes optimizations to the scheduler and validation

## Testing performed to validate your change

---------

Co-authored-by: Steven Landers <[email protected]>
## Describe your changes and provide context
Add optimizations to reduce mutex lock contention and refactor with sync
Maps. This also removes telemetry that was added liberally, and we can
later add in telemetry more mindfully and feature flagged.

## Testing performed to validate your change
loadtest chain testing
@stevenlanders stevenlanders mentioned this pull request Nov 28, 2023
@stevenlanders stevenlanders changed the title Add async scheduler [OCC] Add async scheduler (experiment) Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants