Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying (async) generator behavior #18

Open
legendecas opened this issue Jan 26, 2023 · 31 comments · Fixed by #61
Open

Specifying (async) generator behavior #18

legendecas opened this issue Jan 26, 2023 · 31 comments · Fixed by #61

Comments

@legendecas
Copy link
Member

legendecas commented Jan 26, 2023

Related: nodejs/node#42237, open-telemetry/opentelemetry-js#2951

Should a generator or an async generator capture the async context when being created? Given:

const context = new AsyncContext();

async function* gen(context) {
  await Promise.resolve();
  // ???
  console.log(context.get());
  yield;
  await Promise.resolve();
  // ???
  console.log(context.get());
}

let g;
context.run('init', () => {
  g = gen(context);
});
await context.run('first iter', async () => {
  await g.next();
});
await context.run('second iter', async () => {
  await g.next();
});

The output options could be:

  1. 'first iter'/'second iter', which is the context when the generator's next() is invoked.
  2. 'init'/'init', which is the context when the generator is being constructed.
@jasnell
Copy link

jasnell commented Jan 26, 2023

My initial instinct is option 1, first iter / second iter... specifically, the context whenever next() is invoked but I do see cases where option 2 would be useful also. Unfortunately, with the current AsyncLocalStorage API option 2 is very cumbersome to implement.

const { AsyncLocalStorage, AsyncResource } = require('async_hooks');

const als = new AsyncLocalStorage();

function* gen(resource) {
  yield resource.runInAsyncScope(() => als.getStore());
  yield resource.runInAsyncScope(() => als.getStore());
}

let g = als.run(123, () => gen(new AsyncResource("abc")));

console.log(g.next());

I'd suggest that in the default case, it should be option 1 but we should have something defined that allows for option 2.

@jridgewell
Copy link
Member

I think Option 1 is the correct choice, because it matches the sync calls-stack behavior that AsyncContext is based on.

I'd suggest that in the default case, it should be option 1 but we should have something defined that allows for option 2.

It's not too hard to define a wrapper that does this without you having to change your generator function:

const it = wrapGen(gen());
it.next();
it.next();

function wrapGen(it) {
  const getNext = AsyncContext.wrap(() => it.next());
  if (it[Symbol.iterator]) {
    return wrapSync(it, getNext);
  }
  return wrapAsync(it, getNext);
}

function* wrapSync(getNext) {
  while (true) {
    const next = getNext();
    if (next.done) return next;
    yield next;
  }
}
async function* wrapAsync(getNext) {
  while (true) {
    const next = await getNext();
    if (next.done) return next;
    yield next;
  }
}

@jridgewell
Copy link
Member

Actually, we'll need a bit of spec test to handle parallel calls to an async generator:

const ctx = new AsyncContext();

async function* foo() {
  // Make it so the gen doesn't complete sync
  await sleep(100);

  console.log(ctx.get());

  yield;

  console.log(ctx.get());
}

const it = ctx.run(1, () => foo());

await ctx.run(2, () => it.next());
await ctx.run(3, () => it.next());

Which combination of 1, 2, and 3 should this log?:

  1. 1, 1
  2. 1, 2
  3. 1, 3
  4. 2, 2
  5. 2, 3
  6. 3, 3 Obviously not this one, right?

Well, currently it logs 2, 3. How about this setup:

const it = ctx.run(1, () => foo());

await Promise.all([
  ctx.run(2, () => it.next()),
  ctx.run(3, () => it.next()),
]);

This one logs 2, 2!

The queuing behavior of async generators is a little weird. It maintains its own queue of promises, but doesn't use HostCreateJobCallback. And, if there are multiple parallel calls, it will continue processing the queue.

@jridgewell
Copy link
Member

During today's meeting, we're aligned with having generators record the context at the initialization time, and having every .next() just restore the initialization time's context. So the answer to https://github.com/tc39/proposal-async-context/issues/18#issuecomment-1463304093 is 1, 1 (the first choice).

@mhofman
Copy link
Member

mhofman commented May 2, 2023

To clarify, this would apply to both generators and async generators, or only the latter?

@jridgewell
Copy link
Member

Both

@andreubotella
Copy link
Member

andreubotella commented Jul 21, 2023

While working on the spec text for this, I noticed there are various spec-created iterators which use the generators machinery under the hood through CreateIteratorFromClosure, and would therefore propagate the async snapshot from the time the iterator is created.

This can only be observed if there's some user code running inside the generator, but in the spec as it is right now, that is already the case for array iterators (and I think also regexp iterators) if some property has a getter:

const asyncVar = new AsyncContext.Variable();

const array = [23, 34, 45];
Object.defineProperty(array, 1, {
  get() {
    return asyncVar.get();
  }
});

const iter = asyncVar.run("foo", () => array.values());

array[0] = 12;
asyncVar.run("bar", () => {
    console.log([...iter]);  // [12, "foo", 45]
});

And with the iterator helper proposal, any iterator helper callbacks would run in the generator context:

const asyncVar = new AsyncContext.Variable();

const array = [23, 34, 45];
const iter1 = array.values();
const iter2 = asyncVar.run("foo", () => {
    return iter1.map(v => asyncVar.get());
});
asyncVar.run("bar", () => {
    console.log(iter2.next());  // { value: "foo", done: false }
});

Is this expected? Or should we somehow exclude CreateIteratorFromClosure from this behavior?

@jridgewell
Copy link
Member

Would it be difficult to exclude? I don't think the behavior is problematic, but I would prefer consistency for the language's iterators.

@andreubotella
Copy link
Member

andreubotella commented Jul 21, 2023

For sync iterators, it would need a flag on the generator so it doesn't switch to the snapshot active when the execution was last suspended. For async iterators, I don't know what would be needed yet.

@mhofman
Copy link
Member

mhofman commented Jul 21, 2023

I would prefer consistency for the language's iterators

Wouldn't the alternative be to make language iterators that don't use internal generators to manually carry the async context?

andreubotella added a commit to andreubotella/proposal-async-context that referenced this issue Jul 31, 2023
jridgewell pushed a commit that referenced this issue Aug 8, 2023
* Make sync and async generators restore the initialization context

Closes #18.

* Don't store the snapshots at yield points

* Remove the .vscode folder

* Assert that the snapshot hasn't changed inside the generator

* Fix wrong assertions
@shicks
Copy link
Contributor

shicks commented Mar 20, 2024

Is it reasonable to reopen this issue in light of discussion on #75?

To make this concrete, consider the following snippet:

const v = new AsyncContext.Variable();
async function* foo() {
  for (let i = 0;; i++) {
    await 0;
    console.log(v.get());
    yield;
  }
}
const it = v.run(1, () => foo());
v.run(2, () => it.next());
v.run(3, async () => {
  await it.next();
  it.next();
});

I posit two reasonable options for what this should log (independent of removing any combinations of async/awaits:

  1. 1, 1, 1 - this is the current state of the proposal, treating yield the same as await
  2. 2, 3, 3 - this has the context within the generator inherit from the caller of next(), and is closer to (see note below) the current behavior of AsyncLocalStorage on Node

I understand from discussion on the recent issue (#75) that @littledan and @bakkot were in favor of option 1, and indeed I initially preferred it. On the other hand, @jridgewell and @Qard have argued for option 2. As I see it, some benefits of option 1 include

Benefits of option 2 include

  • more flexible - you can make option 1 from option 2 with a userland wrapper, but not the other way around
  • easier to transpile/polyfill - no need for any special handling for generators
  • more consistent with existing AsyncLocalStorage behavior in Node
  • more consistent with existing built-in iterators (e.g. the array element getter example above)

I wasn't in the earlier discussions last year, but from what I can glean from the various meeting notes and matrix archives, it looks like "Committee prefers init time capture and restore" (from Jun 27 meeting) was in actuality a pretty weak preference? Given the trade-offs above (and feel free to expand on any that I missed), I suspect it might make sense to pivot and go with option 2 here? Or are there considerations I missed?

Note about AsyncLocalStorage: currently the code sample as written actually logs 2, 2, 3 on Node because of task queuing subtleties. I'll argue that if option 2 is picked, we should fix this.

@jridgewell jridgewell reopened this Mar 20, 2024
@jridgewell
Copy link
Member

I hadn't seen @bakkot's example before, but I think it's flawed. Turning this into a "library takes a promise" example, we can see a similar context escape if the library ever needs to call back into user code:

async function libraryTakingAPromise(promise) {
  // Only the promise passed to us was wrapped, not the actual call to libraryTakingAPromise.
  // So, by chaining on the promise (which escaped its context), we've lost that context.
  await promise;
  
  // No way for user code to recover the context id, because this runs outside the `ctx.run()`.
  rerender();
}


libraryTakingAPromise(context.run('id', async () => {
  await 0;
  // Any chains on the async-fn's return promise will not propagate context id.
}));

The appropriate way to write that would work for both promises and generators:

context.run('id', () => {
  libraryTakingAPromise((async () => {
    await 0;
    // The library is capable of chaining on this promise just fine.
  })());
});

context.run('id', () => {
  libraryTakingAnIterator(context.run('id', function* () {
    // The library is capable of capturing context just fine.
    // If it were sync calling `.next()` nothing would be needed.
    // And if it was used `await`, then context would propagate already.
    yield 0;
  }));
});

@shicks
Copy link
Contributor

shicks commented Mar 20, 2024

For more context, I tried to do a GitHub search for code that's using AsyncLocalStorage with generators today. As expected, I'm seeing a lot of middleware, which further suggests that there may be value in propagating context into next() calls, treating each iteration as a separate function invocation by default. We should also clearly document how to make a wrapper (or provide one out of the box).

@bakkot
Copy link

bakkot commented Mar 20, 2024

Turning this into a "library takes a promise" example, we can see a similar context escape if the library ever needs to call back into user code:

I'm confused by this transformation. I don't know what rerender is, but I don't see any reason to expect it to be called in the 'id' context - it has nothing to do with it. But for the generator example, the user is explicitly calling a (generator) function in the 'id' context, and so would expect its entire execution to be in that context.

@dfahlander
Copy link

I'm confused by this transformation. I don't know what rerender is, but I don't see any reason to expect it to be called in the 'id' context - it has nothing to do with it. But for the generator example, the user is explicitly calling a (generator) function in the 'id' context, and so would expect its entire execution to be in that context.

IMHO, as a library vendor, I wouldn't expect the context to be bound through the flow of a generator - the same way I would not expect a variable in a try...finally (a sync context) to do so. Isn't the proposal about async context? Why not keep the rule that simple? Is there any convincing use case for alternative 1 that can't be solved in userland?

@bakkot
Copy link

bakkot commented Mar 21, 2024

I wouldn't expect the context to be bound through the flow of a generator - the same way I would not expect a variable in a try...finally (a sync context) to do so

I don't know what that means. What does try/finally have to do with anything?

@dfahlander
Copy link

dfahlander commented Mar 21, 2024

I don't know what that means. What does try/finally have to do with anything?

With try...finally, I mean a pattern for maintaining a sync context via a variable

export let currentContext;

export function run(v, callback) {
  const parentContext = currentContext;
  try {
    currentContext = v;
    return callback();
  } finally {
    currentContext = parentContext;
  }
}

With this simplified implementation of sync context, a consumer can read the currentContext from anywhere, just like with the purpose of AsyncContext.run().

@jridgewell
Copy link
Member

jridgewell commented Mar 21, 2024

I don't know what rerender is, but I don't see any reason to expect it to be called in the 'id' context - it has nothing to do with it.

Sorry, my example wasn't great. But the point is that the library could reenter into user code, and due to the nesting the run call inside the library fn call, the promise escapes the context. A more concrete example of this is with a tracing function and an event handler:

// This is Library code.
function render(ui: Promise<JSX>) {
  // We wait for the ui to be ready.
  const vdom = await ui;

  // just imagine this does the normal calls to to a rerender a component tree…
}


// - - -

// This is User code
// Imagine we want to capture RAIL metric from click to rerender.
const rail = new AsyncContext.Variable();

function Button() {
  // This measure would probably wouldn't be in Button, but this is just an example.
  const span = rail.get();
  if (span) {
    // If we're rerendering, then we want to end the span now that we've painted
    span.end();
  }

  const onclick = () => {
    // We start the span during the user interation. But notice that `render` isn't wrapped,
    // only the UI generation. That means the `await` in `render` will not inherit our
    // context's span, and the rerender won't have a `span` variable to end.
    render(rail.run(new Span(), async () => {
      // Some work to generate a new UI… maybe a network request or whatever.
      return <Button />;
    }));
  };
  return <button onClick={onclick}>click me</button>;
}

My point is that it's unlikely anyone intends to wrap only an async/generator function when they pass it to a library function. Instead, you'd wrap the entire call to the library function. In both async and generator functions, it'd be easy to propagate the context with that.

But for the generator example, the user is explicitly calling a (generator) function in the 'id' context, and so would expect its entire execution to be in that context.

I don't see this as different. If I call an async function in the 'id' context, then let the promise escape that context before continuing the computation (by awaiting/.then() chaining), then I've intentionally escaped the context.

function* gen() {
  console.log('before yield');
  yield;
  console.log('after yield');
}
const it = ctx.run('id', () => {
  const it = gen();
  it.next();
  return it;
});
ctx.run('other', () => {
  it.next();
});

async function asyncFn() {
  console.log('before yeild');
}
const p = ctx.run('id', asyncFn);
ctx.run('other', async () => {
  p.then(() => {
    // "resuming" the generator in a different context.
    console.log('after yield');
  });
});

If you want a generator to keep the same context, then consume the generator within that context. Eg, call it.next(); it.next(); it.next()… for a sync generator. Or await it.next(); await it.next()… in an async generator. Either way, both will propagate the intended context already.


With try...finally, I mean a pattern for maintaining a sync context via a variable

I'm not sure how this relates to the generator discussion?

@bakkot
Copy link

bakkot commented Mar 21, 2024

I don't see this as different.

I think this is our core disagreement. To me the behavior you've written for async functions is totally expected: it has returned; its execution has finished; there is no reason to expect future work performed using the return value of that function to be related to the context that the function was called in. By contrast, the execution of the generator function is not yet finished; it has not yet returned. So it would be extremely surprising for the body of the function to suddenly be in a different context, since the whole point of this feature is to save and restore the context during the lifetime of a computation which may not complete in a single step without the consumer of that computation needing to manually manage the context.

@dfahlander
Copy link

With try...finally, I mean a pattern for maintaining a sync context via a variable

I'm not sure how this relates to the generator discussion?

Sorry for a bad example. The only thing I am trying to exemplify is that it would be easier to understand the expected behaviour if it was stick to propagating contexts around async calls only and not involve generators at all to have a simpler principle. I'm afraid having special context wrapping around generators will complicate things. I haven't seen a use case where this would be motivated, so if there is such, it would be nice to get an understanding of it.

@dead-claudia
Copy link

I do want to point out something: from an implementation perspective, having async/await but not generators snapshot state means you have to conditionally do snapshots in both initialization and .next(). Also, you'll have to make async generator resume a bit more complicated to avoid the issue in #18 (comment). The end result is option 2 being somewhat complicated to implement.

  • On initialization:
    • If async/await, set snapshot field to a newly created context snapshot
  • On async generator .next():
    • Set a new snapshot field in the resulting async generator request to a newly created snapshot
  • Before resume:
    • If a sync generator, skip the rest of these steps
    • If async/await:
      • Get snapshot from the value of the snapshot field
    • Else:
      • Get snapshot from the async generator request
    • Push the current context
    • Set the current context to snapshot
  • Before suspend:
    • If a sync generator, skip the rest of these steps
    • Get context from the stack-saved context
    • Set the current context to context

Option 1 would be much simpler to implement and wouldn't require any type-specific code (or patching async generator requests to add any fields):

  • On initialization:
    • Set snapshot field to a newly created context snapshot
  • Before resume:
    • Get snapshot from the value of the snapshot field
    • Save the current context to the stack
    • Set the current context to snapshot
  • Before suspend:
    • Get context from the stack-saved context
    • Set the current context to context

I also expect it to be noticeably faster despite the memory load, due to less branching and the fact it's only a few instructions in both suspend and resume.

@dead-claudia
Copy link

Also, nodejs/node#42237 (comment) suggests Node at least still regards their current behavior (option 2 but IIUC with the Zalgo issue in #18 (comment)) as a bug, and that option 1 is the "intended" behavior.

So there's a couple data points to consider.

@shicks
Copy link
Contributor

shicks commented Mar 21, 2024

I suggest that when thinking about the correct behavior, we should consider async functions and sync generators as the two fundamental cases. The behavior of async generators should fall out from merging the above two cases (though it's of course worth validating that the derived behavior is still reasonable). Async functions is clear, so that leaves the question of what makes the most sense for sync generators.

If I write

function* gen() {
  a();
  yield;
  b();
}
const it = v.run(1, gen);
v.run(2, () => it.next());
v.run(3, () => it.next());

which contexts do I expect within the body of the generator, when a() and b() execute? Unfortunately, I think the answer here depends heavily on the situation. I believe @bakkot is correct that in the majority of cases, the intended behavior is for everything to run in context 1. However, in #75 (comment) @Qard brought up a use case where generators are used to model a work queue of independent tasks - in this situation, next() is effectively treated like an ordinary function invocation, which one would expect to snapshot the current context, rather than restore the generator's initial context. The example was also interested in tracing (similar to nodejs/node#42237 (comment)), but in that case it was important to retain the trace surrounding the next(), rather than the initial one.

In considering this, it seems like the question becomes whether it's more important to make option 2 possible at all, or whether it's more important to make option 1 an easy and fast default. If we go with option 2, then option 1 is quite possible with a wrapper around the generator (as demonstrated in the nodejs issue). If we go with option 1, then option 2 becomes completely impossible. So, what are the relevant trade-offs to consider here?

To @dead-claudia's point about performance - it's a little surprising to me (though still understandable) that the performance would be noticeably worse for async generators when it seems to naturally be doing less work. Snapshotting and restoring context for async functions is additional work on top of what would happen without AsyncContext, so being able to skip that extra work for yield seems at first glance like it should be a win. Indeed, when transpiling code to support an AsyncContext polyfill, it's certainly less work to not have to do the extra snapshots/restores. So I would expect at least sync generators to be no worse under option 2, and maybe even better. But I can also understand how the combination could cause extra complexity due to the fact that it's now a different snapshot being restored by the awaits at different points in the body.

Finally, a brief further word about async generators. When I searched for generators declared in the same file as AsyncLocalStorage usages (#18 (comment)), I found that nearly all cases were actually async generators. So for whatever it's worth, these seem more common than sync generators. Most of the results seemed to be middleware, so I could imagine there's a possibility that they have more the characteristics of the "work queue" that @Qard mentioned. But without understanding the relevant frameworks, I can't say conclusively. What unique considerations fall out from making a generator async? In terms of common usage, a for-await-of loop will work the same either way, just like a for-of loop is going to behave identically under both options (assuming the generator is called in the loop initializer). So the most likely scenario where this decision will make a difference is when somebody is interacting with an iterator manually. This is already somewhat of a "power user" case, and given that, I tend to prefer to err on the side of flexibility, rather than worry about making sure the default is correct. This is why I have a slight preference for option 2, assuming the performance cost doesn't make it a non-starter.

@dead-claudia
Copy link

dead-claudia commented Mar 21, 2024

@shicks To be clear, the perf impact, if it does exist, would most impact async functions and sync generators. The existing request queueing overhead would make the slow-down relatively less significant for async generators (it'd exist, but there are already bigger bottlenecks).

Also, this overhead can be easily avoided if you duplicate the code (say, via three different template instantiations), and JITs can of course generate specialized code for each one - this trades some minor CPU cost for a significant growth in binary size. Size-optimized runtimes running on lower-spec devices like low-end phones and (especially) microcontrollers won't be able to specialize away the overhead like this, though, especially if JIT compiling isn't an option.

@jridgewell
Copy link
Member

I looked into nodejs/node#42237, and the linked nice-grpc fix linked in open-telemetry/opentelemetry-js#2951 (comment). It seems people really are going to return the generator's iterator without consuming it themselves. 🫤

It also has code using yield * call.foo(…), which itself could be a middleware generator, which can't be consumed within a context without writing a bunch of manual iteration code. You could certainly wrap call.next(…), but I would probably write that code and miss the bug.


I expect the performance of either choice to be trivial, I don't think it's something that we should consider.


If we were to do init-time, how could we support call-time? Given that the generator may already expect an argument when calling it.next(arg), we probably would need to pass it as an arg to the generator:

function* gen(snapshot) {
  snapshot.run();
}

let snapshot = AsyncContext.Snapshot();
const it = gen({
  run(...args) {
    return snapshot.run(...args);
  }
});

it.next();
ctx.run(1, () => {
  snapshot = AsyncContext.Snapshot();
});
it.next();

It's not great to support this way. If we really want to support it, maybe we could expose another way to start the generator (maybe genFn.noContext(…);) that wouldn't set the internal [[GeneratorAsyncContextSnapshot]] field when starting the generator.

@shicks
Copy link
Contributor

shicks commented Mar 22, 2024

It's not great to support this way. If we really want to support it, maybe we could expose another way to start the generator (maybe genFn.noContext(…);) that wouldn't set the internal [[GeneratorAsyncContextSnapshot]] field when starting the generator.

I think it's important to consider this use case. One big problem with a noContext method is that there's no way for it to work (intuitively) for instance methods: instance.generatorMethod.noContext(...) may want to access instance as this, but such behavior would be very surprising (and a nightmare to polyfill, at that).

I feel like it also moves the decision to the wrong place - generally I would expect the author of the function know what context it expects, rather than the caller - so I'd rather see the decision cemented at definition time, rather than call time (i.e. by wrapping the function somehow - this could be done with a decorator, maybe - though again, I don't see a great way to get the right default).

Probably if the default were to snapshot at init and I wanted a generator that captured the snapshot, I'd end up writing something along the lines of

function wrapNoContext(fn) {
  return function() {
    let snapshot;
    const fauxSnapshot = {run: (...args) => snapshot.run(...args)};
    const gen = fn.apply(this, [fauxSnapshot].concat(arguments));
    return {
      next: arg => (snapshot = new AsyncContext.Snapshot(), gen.next(arg)),
      return: arg => (snapshot = new AsyncContext.Snapshot(), gen.return(arg)),
      throw: arg => (snapshot = new AsyncContext.Snapshot(), gen.throw(arg)),
      [Symbol.iterator]() { return this; },
    };
  };
}

and the usage would look something like this:

const gen = wrapNoContext(function*(snapshot, arg1, ...) {
  ...
  yield;
  ... snapshot.run(var1.get()) ...
})

gen(arg1, ...)

It's basically the same ideas as yours, but you can at least wrap up the usage sites to "just work". Unfortunately there's no way to have it just work inside the generator definition, and you still end up needing the extra parameter, and to bend over backwards to actually extract anything out of the context.

@TerribleIdeaBot said:

I don't suppose there's any way to write the spec such that some sort of "indirection context" would be natively supported, without a huge performance penalty even when it's not used?

@littledan
Copy link
Member

Note this Node.js bug report where people seem to be asking for the restore-after-yield generator behavior: nodejs/node#52317

@littledan
Copy link
Member

Do we have a conclusion for this thread? @jridgewell seems to have written for our April 2024 update slides that we're going to save/restore across yield in JS code, but not in built-in generators, which is a conclusion that I support. We have this all specified, e.g., in #61. What is left to resolve?

@Qard
Copy link

Qard commented Apr 8, 2024

If we do #77 the decision to exclude generators may no longer be relevant as the calling context could be used to get the context where gen.next() is called. May be worth waiting on a decision for #77 before we make a decision here.

@jridgewell
Copy link
Member

Our agreement from last group meeting was that landing #77 was an acceptable compromise so that generators can use init-time by default.

@himself65
Copy link
Member

himself65 commented Apr 10, 2024

Form library developer perspective, I will return string or iterator of tokens for a AI tools. And I use async local storage to track which llm is being called and will do logging. It’s very to trick to handle iterator and I had to wrap every response (greatly there is function decorator in standard)

So I will go with option 2. but another side this behavior looks like v8 inherited, there could have more breaking changes I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet