Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(workflow): Fix multiple context leaks in reuseV8Context executor #1519

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mjameswh
Copy link
Contributor

What was changed

  • Fix multiple context leaks and bugs in the reuseV8Context executor:
    • …by reassigning a new object to an existing shared global variable: globalThis.console = { ...globalThis.console, wfId: workflowInfo().workflowId };
    • …by modifying one of Node's built in global objects: globalThis.Number.a = workflowInfo().workflowId;
    • …by deleting a previously set global variable: globalThis.a = 1; await sleep(1) ; delete globalThis.a ; await sleep(1) ; globalThis.a = (globalThis.a || 0) + 1; /* globalThis.a is 2 rather than 1 */

@mjameswh mjameswh requested a review from a team as a code owner September 12, 2024 20:52
@mjameswh mjameswh force-pushed the fix-reusev8context-reassign-bug branch from 44f477c to 04810fb Compare September 16, 2024 21:22
@mjameswh mjameswh changed the title [WIP] fix(workflow): Fix multiple context leaks in reuseV8Context executor fix(workflow): Fix multiple context leaks in reuseV8Context executor Sep 16, 2024
Comment on lines 158 to 160
map.set = frozenMapSet;
map.delete = frozenMapDelete;
map.clear = frozenMapClear;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to freeze the map itself too so that they don't override .set .delete .clear again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. After replacing these methods, we continue with the normal deepfreeze logic below, which will freeze the set/delete/clear methods, then the map itself.

@antlai-temporal
Copy link
Contributor

I'm a bit confused about what is the ultimate goal and why this is needed. If we want to eliminate all side channels this is not going to work, e.g., you could have timing side-channels, and the v8 VM is not designed to eliminate side-channels. If the goal is that other untrusted code is running in the same reused v8 context and attacking other users by modifying the context, I'm not sure we are protecting all the primordial classes (a String, Number,...) either, and in any case, that's a very hard problem...

Is this just to help prevent some mistakes, nothing to do with security?

@mjameswh
Copy link
Contributor Author

Is this just to help prevent some mistakes, nothing to do with security?

No, that's really not a security thing (that's simply impossible AFAIK with Node's vm). And we don't have a goal of preventing all possible escapes out of the VM.

The reuseV8Context is a performance optimization (roughly 66% reduction on RAM usage for cached workflows, and up to 50% on CPU usage). The trick is that instead of creating a new vm context for each workflow (which is slow and allocates a large number of objects), we use a single vm context, swapping global variables in and out before and after executing each activation.

However, there's been some reports of users doing uncommon but still legitimate and reasonable things in their workflow code, that would result in context from one workflow execution leaking into another workflow execution. Depending on use cases, that may result in non-deterministic behaviors, memory leaks, or some other oddities.

For example, one user recently reported that they were mocking functions in the console object inside their workflow code to redirect messages to the workflow log. They were doing that because their workflow was calling some outside library, not meant specifically for Temporal Workflows, which would print things its own logging through console.log. Agreed, that's not the best way of doing this, but still something I would have expected to be supported by our sandbox. Their code looked something like this:

async function myWorkflow() {
  const workflowId = workflowInfo().workflowId;
  globalThis.console = {
    ...console,
    log: (message) => {
      log(`${workflowId}: message`)
    }
  }
  // ...

The problem here is that previously, when leaving a workflow's execution context, the Reusable VM's would not touch any global variable that existed right after start. In this case, that means that it would leave the console object in place, even though it was no longer the original, pristine console object! That would result in them seeing log messages with an incorrect workflowId.

Once you know it, it's quite easy to fix their code to avoid this issue (they did), but it's also an opportunity for us to make our engine more reliable.

@mjameswh
Copy link
Contributor Author

There's also one test, "Shared global state is frozen", that existed before this PR that was actually not testing what we thought it was testing. That is, the test was asserting attempting to add a custom property to some global object from Node's default vm context would be rejected.

However, it was doing so by adding something to the setTimetout function, but it turns out that's insufficient as the globalThis.setTimeout is overridden by the SDK itself, resulting in it being present in the globals object exposed outside of the vm, where global variables that would not been modified inside the vm would not appear in the globals object (seems to be some kind of internal optimization done by Node/V8).

This PR fixes that bug; the test is now asserting the proper behavior against the Array global function instead.

} catch (err) {
if (typeof value !== 'object' || value.constructor.name !== 'Uint8Array') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are checking that if (value && typeof value === 'object') { already
do you need to check again because deepFreeze could change it?
Also, a comment why Uint8Array is special may help

t.deepEqual(res1[0], res1[1]);
t.deepEqual(res1[0], res1[2]);
t.deepEqual(res2[0], res2[1]);
t.deepEqual(res2[0], res2[2]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check res1[0] !== res2[0] for completeness

);
for (const [k, v] of this.pristine.entries()) {
if (k !== 'globalThis') {
deepFreeze(v.value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to deepFreeze the key k too?

(globals as any).__pristine = this.pristine;
// The V8 context is really composed of two distinct objects: the `globals` object defined above
// on the outside, and another internal object to which we only have access from the inside, which
// defines the built-in global properties. We need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment

Comment on lines +130 to +149
for (const [pname, pdesc] of bag.entries()) {
Object.defineProperty(context, pname, pdesc);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused aboutbag initialization, in particular if the first time we call
return vm.runInContext(TEMPORAL.api.${fn}``
the bag would be empty, and whether that matters.
I see the bag is filled inside finally{... after `bag.clear()` but not sure about the first call...

} catch (err) {
if (typeof value !== 'object' || value.constructor.name !== 'Uint8Array') {
console.log(`Failed to freeze ${path}.${name}`, err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this before merging.

export function deepFreeze<T>(object: T): T {
// This implementation is based on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/freeze.
// Note that there are limits to this approach, as traversing using getOwnPropertyXxx doesn't allow
// reaching variables defined using internal scopes. This implementation specifically look for and deal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// reaching variables defined using internal scopes. This implementation specifically look for and deal
// reaching variables defined using internal scopes. This implementation specifically looks for and deals

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is dealing with these types important? Have you seen any shared maps and sets that are exposed to users?

return [middle, final];
}

test('Deep Freeze reaches Map values', async (t) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually freeze the map? We're not supposed to freeze user's modules.

Comment on lines 107 to 131
(setTimeout as any).a = 1;
(Array as any).a = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Comment on lines 160 to 161
env.client.workflow.execute(sharedGlobalReassignment, { taskQueue, workflowId: randomUUID() }),
env.client.workflow.execute(sharedGlobalReassignment, { taskQueue, workflowId: randomUUID() }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be racy in CI and may not actually provide the test coverage you want. There's nothing guaranteeing that the workflows will run concurrently.

*/
readonly contextKeysToPreserve: Set<string>;
private _context?: vm.Context;
private pristine?: Map<string, PropertyDescriptor>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring.

const workflowModule: WorkflowModule = new Proxy(
{},
{
get(_: any, fn: string) {
return (...args: any[]) => {
Object.assign(context, bag);
for (const [pname, pdesc] of bag.entries()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would have liked to prevent constructing the entries array for every time we call into the VM.

// Looks like Node/V8 is not properly syncing deletion of keys on the outter context
// object to the inner globalThis object. Hence, we delete them from inside the context.
context.__TEMPORAL_ARGS__ = keysToCleanup;
vm.runInContext(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a bit of overhead for every time we enter the context, I would try to find an alternative way to do this.

@mjameswh mjameswh force-pushed the fix-reusev8context-reassign-bug branch from 04810fb to 0bd389b Compare November 22, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants