-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a query engine for incremental compilation #355
Conversation
One thing to consider:
|
experimental/incremental/executor.go
Outdated
// Queries returns a snapshot of the URLs of which queries are present (and | ||
// memoized) in an Executor. | ||
// | ||
// The returned slice is sorted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for testing? Seems like this could potentially be a big slice to copy. Maybe instead of returning a slice, it should return a iter.Seq[string]
, and you could push making a sorted slice to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of intended as a general debugging aid. I'm not sure we really need an iterator for this, because you really do want to sort them first and that requires buffering regardless.
experimental/incremental/task.go
Outdated
// | ||
// If any of the queries fails, or if the [context.Context] passed to the | ||
// [Run] call that spawned the [Task] is cancelled, this function calls | ||
// [runtime.Goexit]. This is not something callers should be concerned about, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wha?? If any single query fails, the whole goroutine disappears? That seems like a very dangerous API. This would be simpler to use if it just used standard error-return control flow. If a query fails, seems like it should return a correspond failed Result
. And if the whole task is cancelled, it should return the cancellation cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, my gut reaction is "why is this even exported"? I now see it's because it really wants to be a method of Task
(like Run
wants to be a method of Executor
), but it needs to be generic for type safety. Maybe mention something to that effect in the Go docs for each of these, to make their use more clear. In this case, this is expected to be called from Query.Execute
to gather dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you know I thought about it and this is kinda nuts. The new API instead has really scary "hey you need to return this error or something will explode".
I also added the docs you suggested.
experimental/incremental/executor.go
Outdated
return nil, context.Cause(ctx) | ||
} | ||
|
||
// Now, for each non-failed result, we need to walk their dependencies and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "non-failed" mean in this regard? The loop has no conditional that looks like it's filtering out failed items.
Also, why would we exclude the errors from failed results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer makes sense with the new error propagation convention.
experimental/incremental/executor.go
Outdated
// Now, for each non-failed result, we need to walk their dependencies and | ||
// collect their errors. | ||
for i, query := range queries { | ||
task := e.getTask(query.URL()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to introduce a limit on the size of the cached results, this could potentially create a new task instead since the one created above might have been evicted. Maybe we instead need to associate these tasks as deps
of root
, so you can instead iterate root's deps and not worry about concurrent/near-immediate eviction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at the least, pass a flag to getTask
to tell it to not create a task and instead return nil. Then update this to ignore nil return values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we punt automatic eviction to a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but it would be nice to first drop a TODO right here so we don't forget that there is a potential risk here.
experimental/incremental/task.go
Outdated
// queries which depend on it. | ||
// | ||
// This will not cause the query to fail; see [Task.Fail] for that. | ||
func (t *Task) Error(errs ...error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is borrowing heavily from testing.T
, but I think we could make it more clear, since this is not test code. Maybe instead this signature could be NonFatalError(...error)
. Also, it's a little confusing that it allows zero errors to be provided, so maybe even NonFatalError(error, ...error)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's NonFatal now.
experimental/incremental/task.go
Outdated
|
||
for i, q := range queries { | ||
i := i | ||
deps[i] = caller.exec.getTask(q.URL()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I had to add to the main executor in the root protocompile
package (since the existing compiler is somewhat similar in how it compiles files) is cycle detection. If, for example, there were incorrect cyclic imports between files, we need to be able to detect that in order to prevent deadlock in this execution engine (where a task is indirectly blocked on itself).
The current compiler does not do this very efficiently (it recursively crawls the entire dependency graph of the caller task to see if it finds the newly requested file). I think a better approach would be to add to each task
a "path" of its ancestor tasks, and if we ever see a request to resolve a dependency that already exists in the caller's path (or is equal to the caller), the task should fail fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also clearly document that Query.Execute
should never call Run
, only Resolve
. (Otherwise, we'd lose the trail of callers and not be able to detect cycles.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented cycle tracking. There's even a test!
In the process I found a deadlock due to resource starvation. There's some fussiness in Resolve to drop the hold on exec.sema in the right place to avoid starvation.
experimental/incremental/executor.go
Outdated
|
||
go func() { | ||
results = Resolve(root, queries...) | ||
close(root.result.done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be in a defer. Otherwise, if a query fails, the goroutine will exit, but since the context is not yet cancelled and this channel not closed, the select below would hang.
(Having said that, I've left other remarks about how I think the use of runtime.Goexit
is dangerous and we should probably revisit how failures are communicated. But this probably needs to be a deferred function no matter how those changes play out.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right regardless, if a panic comes crashing through for whatever reason, it will get swallowed.
experimental/incremental/task.go
Outdated
if r.Value != nil { | ||
// This type assertion will always succeed, unless the user has | ||
// distinct queries with the same URL, which is a sufficiently | ||
// unrecoverable condition that a panic is acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While internal panics are okay, they would be categorically bad if they happened in the BSR. Since this is running in a separate goroutine, the calling application (such as BSR code) would have no way of recovering.
The current compiler has a top-level deferred function (in the equivalent of this package's Run
function, in the go
function that launches these goroutines) that catches any panic and reports it as a fatal error, specifically for scenarios like this. That way compiler bugs don't turn into serious operational issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I also think it would be better to just return an error here. (Sorry to keep harping on that. But it's much more straight-forward control flow.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally consider this type of panic to be on the same level as a random nil deref. Avoiding panics is not as simple as "never panic"; in this case, it indicates a fairly catastrophic bug on the part of query writers.
How to handle such errors should generally be done by the people who are panic-intolerant, such as with a recover. We can't prevent query authors from panicking out of Execute, after all (unless we try catching that, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle such errors should generally be done by the people who are panic-intolerant, such as with a recover. We can't prevent query authors from panicking out of Execute, after all (unless we try catching that, too).
The problem with this is that they can only recover from panics on their (callng) goroutine. If the library function they call spawns a goroutine and the panic happens there (like with this), then there is no ability to catch it and prevent it from crashing the app.
So while I understand that we can't eliminate all panics, since nil-deref/index-out-of-range/etc could still occur, we can and should recover from any panic that happens in a goroutine that is created as an internal detail of this package. The existing compiler in the protocompile
package does that here.
experimental/incremental/task.go
Outdated
func Resolve[T any](caller Task, queries ...Query[T]) []Result[T] { | ||
results := make([]Result[T], len(queries)) | ||
deps := make([]*task, len(queries)) | ||
wg := semaphore.NewWeighted(int64(len(queries))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you call it wg
, like "wait group". I guess you are using a semaphore because it is context-aware? Maybe a little comment that we use this instead of sync.WaitGroup
specifically because waiting can be interrupted via context cancellation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually added such a comment before I even noticed your comment here!
Yeah, I am not a fan of FWIW, there are other open-source cache implementations that are decent and handle the size limits with evicting LRU entries. |
…3403) Since removing concurrency from the LSP, performance has degraded substantially. This PR fixes some performance issues by being smart about doing less work. Most of this will be mooted by bufbuild/protocompile#355, but until we have intelligent memoization, we can use some dumb heuristics to improve perf. First, we don't send progress notifications for files that have not been opened by the client's editor. Hammering the Unix socket with notifications is a major source of slowdown, and these notifications are not useful to the user, because they are about files they do not care about. Second, we don't send diagnostics for the same. These files get reparsed when opened in the editor regardless, so this doesn't risk staleness. Third, we were previously reindexing imported files once per cross-file ref. This is clearly an oversight on my part, which I suspect was caused due to nasty merge conflicts on my last PR. I noticed because protovalidate/priv/private.proto was getting hammered in the logs -- all because validate.proto references the priv symbol dozens of times. 🤦 After fixing all of these, the LSP went from sluggish to snappy (in VSCode).
// | ||
// Errors that occur during each query are contained within the returned results. | ||
// Unlike [Resolve], these contain the *transitive* errors for each query! | ||
func Run[T any](ctx context.Context, e *Executor, queries ...Query[T]) (results []Result[T], cancelCause error) { | ||
// | ||
// Implementations of [Query].Execute MUST NOT UNDER ANY CIRCUMSTANCES call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a TODO that we could do a best-effort validation of this by providing a context to Query.Execute
. The package could add a context value to it to indicate we're executing a query, and this function could check for that context value and, if present, fail fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and did the validation :)
experimental/incremental/task.go
Outdated
}) | ||
results[i].NonFatal = r.NonFatal | ||
results[i].Fatal = r.Fatal | ||
}) || async // Need to avoid short-circuiting here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could instead use |=
up on line 98.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go does not provide |=
for bools. I know, right?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I've run into this before, too. 🤮
experimental/incremental/task.go
Outdated
url := q.URL() | ||
fmt.Println(url) | ||
for node.Query != nil { | ||
fmt.Println(node.Query.URL(), url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I suspect this was in here (and a couple of lines above) for debugging and accidentally left in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
experimental/incremental/task.go
Outdated
} | ||
|
||
if wg.Acquire(caller.ctx, 1) != nil { | ||
return | ||
return false | ||
} | ||
|
||
// Complete the rest of the computation asynchronously. | ||
go func() { | ||
defer wg.Release(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is where I think we need to put a recover
call. That way, panics in user-provided code (i.e. Query.Execute
) but also bugs in compiler/library code won't cause the calling application to crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrPanic
!
// Note: this function really wants to be a method of [Task], but it isn't | ||
// because it's generic. | ||
func Resolve[T any](caller Task, queries ...Query[T]) (results []Result[T], expired error) { | ||
caller.checkDone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Resolve
now returns an error, I think it would be better to have this function return an error that can be propagated, instead of panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat disagree that this should be an error, but I don't think it deeply matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no it has to be a panic, otherwise e.g. callers need to check for an error in NonFatal too. I think a panic is better, because it should only ever happen if you do something nasty like escape a Task
into a global.
experimental/incremental/task.go
Outdated
node := &caller.path | ||
url := q.URL() | ||
fmt.Println(url) | ||
for node.Query != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a multi-statement for
loop to make this a little more concise:
for node := &caller.path; node.Query != nil; node = node.Prev {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet: path.Walk
.
experimental/incremental/task.go
Outdated
if closed(r.done) { | ||
done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a little more intuitively expressed with an unconditional close(r.done)
. I think the only places it gets closed are here and then after q.Execute
is called.
Then, to detect panics, a construct like this is effective and a little less code/easier to follow IMO:
didPanic := true
defer func() { /* ... */ }()
r.Value, r.Fatal = q.Execute(callee)
didPanic = false
So the deferred function just inspect didPanic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this significantly another way, please take a look at run and runAsync
experimental/incremental/task.go
Outdated
@@ -204,21 +266,34 @@ func run[T any](caller Task, task *task, q Query[T], wg *semaphore.Weighted, don | |||
} | |||
|
|||
defer func() { | |||
callee.done = true | |||
caller.exec.sema.Release(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to do this before the mutation to the task below (line 277, task.result.Store(nil)
)?
I think after this is called, the calling task may immediately "wake up" and could potentially observe a partial result before the code below clears it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only is callee.done incorrect, I got rid of Task.done altogether.
experimental/incremental/url.go
Outdated
// URLBuilder is a helper for building URLs. | ||
// | ||
// It is a simplified version of the interface of [net/url.URL]. | ||
type URLBuilder struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like a real "builder" pattern. Instead, this seems like a URI
, not a builder, and the Build()
method could just as reasonably be String()
.
Also, not sure this pays its freight -- it's extra surface area to maintain if/when we make it exported/non-experimental, and it doesn't seem hugely valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, but I think this is really close. So LGTM modulo the few remarks.
return anyQuery{ | ||
url: q.URL(), | ||
return &AnyQuery{ | ||
key: q.Key(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to also set actual: q
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oopies
experimental/incremental/task.go
Outdated
|
||
if wg.Acquire(caller.ctx, 1) != nil { | ||
// run executes a query in the context of some task and writes the result to out. | ||
func (t *task) run(caller Task, q *AnyQuery, done func(*result)) (async bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the one below are confusingly named. runAny
actually blocks for the result and isn't really async. This is the async version, since it returns immediately and can spawn a goroutine to actually invoke done
.
Maybe rename this one to start
and the other to run
?
This change adds a new
experimental/incremental
package, which provides a generic dependency memoization framework, intended for parallelization of compilation operations. It follows the current design for parallelization in the compiler but has the following new features:This is intended to be used for parallelization and memoization in the LSP and throughout the compiler.