Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM bug when building stack trace #219

Open
ashelkovnykov opened this issue Mar 14, 2024 · 2 comments
Open

OOM bug when building stack trace #219

ashelkovnykov opened this issue Mar 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ashelkovnykov
Copy link
Contributor

There was a bug in the guard page implementations of #202 / #211 that would occur when if the NockStack ran out of memory during the copying procedure. @eamsden describes it fully in PR #217. That same PR patches this issue by checking whether or not the NockStack was in copy mode when an error occurred, and immediately popping a frame if so. This abandons the data on the current frame, of which only the stack trace data is valuable at that point. There is no way around this, as elaborated upon here.

However, there remains a bug in the code that can further corrupt the stack traces. The NockStack eagerly moves the alloc / stack pointers on allocation and relies on the guard page to catch OOM errors. As a result, it's possible that the pointers get moved past each other, far beyond the guard page. Therefore, the attempts of the exit() function to stitch together the stack traces into a $tang (which is also allocated on the NockStack) might clobber exiting data.

If the alloc and stack pointers were moved back to their initial positions and we were now allowed to ignore the guard page, then allocating new cells for returning a $tang of traces would be fine since we would have 16 KiB of room for allocating new cells.

@ashelkovnykov
Copy link
Contributor Author

#213 also presumes that flog! is going to be called from within interpret() so that a guard page exists. It's another case where technically our usage of the NockStack is unsafe without an outside safety net.

Should we only explicitly document it, or should we bring back the now-abandoned checks for whether an allocation pushes the alloc / stack pointers past each other? Not sure what the performance cost would be to make that check on every allocation, but obviously non-zero.

@eamsden
Copy link
Collaborator

eamsden commented Mar 20, 2024

Explicitly document for now.

@eamsden eamsden added the bug Something isn't working label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants