Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async code on the same thread corrupts async state causing scripts to restart #65

Open
Arakade opened this issue Oct 25, 2023 · 10 comments

Comments

@Arakade
Copy link

Arakade commented Oct 25, 2023

We have script A with 2 functions A1 and A2. A1 calls something async and awaits it. Before the await completes, something else calls A2. When A1 completes, rather than resuming A1, the whole script (A) runs from the beginning!

Reproduction

I've attached a zip with 2 files. There's a unit test to paste at the bottom of AsyncTests unit test file that attempts to demonstrate the problem. Define SINGLE_THREAD to see the problem.

AsyncTest-repro.zip

Investigation of unit test

Both WattleScript and MoonSharp detect 2 threads trying to enter the same Lua VM and reject. For this reason, the problem doesn't happen for default command-line invocation. This is what you see without SINGLE_THREAD defined.

The reproduction only happens when not using a thread-pool based SynchronizationContext. (Shown with SINGLE_THREAD defined.) Specifically we use WattleScript with async to run Lua "mod" code in our WIP Unity game with the "auto await" facility enabled. Unity has its own SynchronizationContext that runs everything on the same thread.

N.b. The test case contains a hacked-up version of the Unity sync context so I'm not proposing it as a PR to show the problem. Someone who understands SynchronizationContexts better than me could probably write one that does the same job.

@Arakade
Copy link
Author

Arakade commented Oct 25, 2023

Pondering how to fix, I'm wondering if something like this might work. It seems to pass the test but I'll admit to not grokking the WattleScript source base or knowing all the implications :)

Processor.cs.patch

@CallumDev
Copy link
Contributor

This is a tricky one as lua states, and by extension both MoonSharp and WattleScript Script classes, are strictly not re-entrant. The patch provided looks ok in concept, but I'm not sure about the signalling mechanism, more specificially that it adds an extra allocation to entering the VM.

If it's not able to be modified to be non-allocating, I'd still be happy to accept this as a PR with the repro cleaned up, and some comments about it emulating unity's behaviours.

@Arakade
Copy link
Author

Arakade commented Oct 27, 2023

OK. Thanks for reviewing it and replying.
I'll see what I can do to improve this when time. If nothing heard within a couple of months, please feel free to poke me here and on MoonSharp Discord 😉
LMK if you contrive a better answer!

@Arakade
Copy link
Author

Arakade commented Oct 28, 2023

I just took a little look and wish to check my understanding of your concern re. allocation before proceeding.

Potential 1 : extra async state machine allocation due to EnterProcessorAsync()

I think you're concerned about the use of EnterProcessorAsync() on all uses of CallAsync_Internal() right? Async methods in C# are transformed into an inner class that implements IAsyncStateMachine IIUC and it's the extra allocation of that that concerns you, correct?
As such, moving the m_InAsync check to before a call would suffice right?

Potential 2 = unblocked path allocation

To me there are 2 main paths -- unblocked and blocked (delineated by whether m_InAsync is set).

When unblocked, everything proceeds as it would have without the change.
When blocked, we allocate (i) a TaskCompletionSource, (ii) the handler closure and (depending how Action combining is implemented, maybe) (iii) an extra Action?

So you might be asking to avoid allocating in the "blocked" case?

If neither of these, I might need some help understanding the issue and request some clarification.

@CallumDev
Copy link
Contributor

TL;DR: Potential 1 + Potential 2 blocked path.

It is mostly the allocation of the handler closure in the blocked path that concerns me, as a delegate allocation is quite heavy, particularly for those still using boehm GC on unity.

Changing the m_InAsync check in the happy path to go down a non-allocating way would be good as well.

@Arakade
Copy link
Author

Arakade commented Nov 15, 2023

Hello again, @CallumDev

I've been working on this more and think a different tack might work better.
Rather than preventing reentrancy, let's specifically enable it.

I changed to this idea after having implemented the zero-allocation blocking variant you asked for previously. Once I did that, I realised it would cause another problem -- deadlock. The project I'm working on specifically has async Lua code that awaits an async C# call which awaits a Lua call on the same VM! (message bus listeners)
E.g. C# 'A' --> Lua script 'B' --> C# 'C' --> function 'D' in script 'B' (this last part is the re-entering the VM and causes deadlock!). I actually wrote deadlock detection code and hit it with this! Better would be allowing.

So I went back and looked into the original cause of the problem.
I discovered the second call was resetting m_SavedInstructionPtr on exit such that when the original await retVal.Task continued, the next instruction was the start of the script (its chunk-root) -- hence the re-run! I tried storing the instruction pointer in a local variable before awaiting and restoring afterwards and the test passes!

That might be a sufficient solution but I suspect there are scenarios wherein other data could be corrupted (probably involving some interleaving of asyncs and delays in 2 methods). I feel it would be better to separate all async calls to their own Processor. I saw the coroutine code does exactly that -- creates an additional Processor and runs all of its code within it. So I propose we do the same. All async calls create a Processor which runs the async work. We could optimise that slightly by saying the 'first' one can use the non-async Processor (to save allocation when not needed).

What do you think?

@Arakade
Copy link
Author

Arakade commented Nov 15, 2023

An initial implementation is surprisingly simple (see attached).

I'd say it needs some multi-threading protection (since EnterProcessor() which protects this in the existing version checks per-Processor members). One way would be to share it (probably via Interlocked to make it fast and safe). This would error with multiple threads. Given it's likely quite common to run with async on a thread pool, I guess reintroducing the "blocking and waiting" code might be better. I'll look at that next.

What else might need considering?

  • Perhaps some pooling of the Processor instances to prevent creating and throwing away?
    Maybe that's better done later? It'd need pool sizing, retention time consideration etc which all suggests user configurability = API extension (I guess added to ScriptOptions?)
  • Other things?

LMK. I'll work on this some more tomorrow GMT day-time. The single-threaded variant is sufficient for my project's needs (since we're in that single-threaded SynchronizationContext) but I realise we probably can't merge this without a sensible MT solution.

Simple__async_per_Processor__implementation___(no_MT_protection).patch

@Arakade
Copy link
Author

Arakade commented Nov 16, 2023

We agree the VM code is not thread safe. We therefore need to lock/block entry to all Script-affecting code to a single thread at a time. The original code attempted to do this by setting m_OwningThreadId in EnterProcessor() and unsetting in ExitProcessor(). However it did so without a lock and allowed coroutines (which have their own Processor) to use a different instance's version. Both of these allow different threads to run (albeit with different windows of vulnerability).

Additionally running Tasks on thread pools means any Task could (a) start on a different thread (when run from a Task) or (worse) (b) change threads on resume from awaiting. Furthermore we must allow other Tasks to act on this Script (albeit with separate Processors) while we are awaiting. This is especially true if Lua code awaits code that calls the same VM's functions (see previous comment about deadlocks).

All of this suggests we must:

  • Check thread and await (which is effectively a non-thread-blocking lock) if set on:
    • async enter
    • await completion
    • non-async enter!
  • Drop this 'effective lock' and notify awaiters on:
    • exit
    • await start

Does this sound reasonable?

@Arakade
Copy link
Author

Arakade commented Nov 23, 2023

Provisional version that largely works (passes all new async tests but fails 22 existing tests).
I'll resume work on fixing those in early December then work towards preparing a PR.
Thought I'd post this to give an idea of the direction things are heading atm.

Switched from Interlocked and lock to a per-Script SemaphoreSlim using its WaitAsync() method to lock all access to the Script. This logic is mostly encapsulated in the new ScriptThreadSafety class.

LMK any thoughts.

20231122-good.patch

@CallumDev
Copy link
Contributor

CallumDev commented Nov 23, 2023

This latest patch doesn't apply cleanly against 50e7a07, making it a bit tricky to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants