Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault sometimes occurs on latest Neovim #603

Closed
yutkat opened this issue Jun 1, 2022 · 6 comments
Closed

Segmentation fault sometimes occurs on latest Neovim #603

yutkat opened this issue Jun 1, 2022 · 6 comments

Comments

@yutkat
Copy link

yutkat commented Jun 1, 2022

I'm not quite sure under what conditions this happens, but it seems that sometimes req->data in the following code is NULL.

luv/src/fs.c

Line 38 in 02d703b

luv_cleanup_req(L, (luv_req_t*)req->data);

As a result, Neovim falls down with the segmentation fault.
This occurs from commit 02d703b. Before that, it did not reproduce.

I would like to add a NULL check at the beginning of the following function, what do you think?

luv/src/lreq.c

Line 70 in 02d703b

static void luv_cleanup_req(lua_State* L, luv_req_t* data) {

  if (data == NULL) {
    return;
  }

ref: #600

zhaozg added a commit to zhaozg/luv that referenced this issue Jun 1, 2022
@squeek502
Copy link
Member

Thanks for reporting this. If possible, I'd like to find a way to reproduce it so we can verify that the eventual fix is correct.

@squeek502 squeek502 mentioned this issue Jun 1, 2022
@squeek502
Copy link
Member

squeek502 commented Jun 2, 2022

I've found one way to reproduce this, but it seems unlikely this is what you're running into, as the segfault happens at a different spot. If the async version of fs_scandir is used and the loop is never run before exit like so:

local uv = require('luv')
uv.fs_scandir('.', function() end)
-- no uv.run() call

then the following will happen:

luv_fs_scandir: 0x4bdf5c0   <-- req created here
luv_fs_gc: 0x4bdf5c0        <-- req cleaned up and req->data set to NULL here
luv_fs_cb: UV_FS_SCANDIR    <-- req->data tries to be used here
==134616== Invalid read of size 8
==134616==    at 0x4FD0A82: luv_fs_cb (fs.c:396)
==134616==    by 0x4FE2954: uv__work_done (threadpool.c:318)
==134616==    by 0x4FE6634: uv__async_io (async.c:163)
==134616==    by 0x4FF9184: uv__io_poll (epoll.c:374)
==134616==    by 0x4FE6E5B: uv_run (core.c:391)
==134616==    by 0x4FD55D1: loop_gc (luv.c:799)

That is, the callback is executed during loop_gc as the program is exiting, but the fs_scandir's req was already garbage collected. This is related to #437 but a quick workaround/fix for this would be to check that data is non-NULL here:

luv/src/fs.c

Lines 392 to 393 in 02d703b

static void luv_fs_cb(uv_fs_t* req) {
luv_req_t* data = (luv_req_t*)req->data;


A possible cause of the segfault you're experiencing (during luv_fs_gc) could be from here:

luv/src/fs.c

Lines 452 to 454 in 02d703b

luv_cleanup_req(L, data); \
req->data = NULL; \
uv_fs_req_cleanup(req); \

which is only triggered when using the async version of a function and the initial uv_fs_* function call returns an error, which AFAIK is impossible for fs_scandir but I'll need to confirm that better. To be safe, though, it'd probably be best to wrap it in a

if(req->fs_type != UV_FS_SCANDIR) { 

check as is done everywhere else that this req cleanup code exists (since the fs_scandir req needs to live on). Again, I'd really like to reproduce it before making that change, though.

EDIT: It's possible to hit this with an out-of-memory error when Libuv tries to copy the path string. Again, seems unlikely this is what's causing it but definitely worth protecting against.

@squeek502
Copy link
Member

squeek502 commented Jun 2, 2022

Artificially inducing an error in uv_fs_scandir and the following code reproduces the luv_fs_gc segfault:

local uv = require('luv')
uv.fs_scandir('.', function() end)
uv.run()
luv_fs_scandir: 0x4bdfd30
FS_CALL error path: 0x4bdfd30 UV_FS_SCANDIR    <-- req cleaned up and req->data set to NULL here
luv_fs_gc: (nil)
luv_cleanup_req: (nil)
==138378== Invalid read of size 4
==138378==    at 0x4FCC3F0: luv_cleanup_req (lreq.c:73)
==138378==    by 0x4FD0937: luv_fs_gc (fs.c:39)

I can't find another way to trigger this as of now, so I'll create a PR with these fixes and go from there.

@yutkat
Copy link
Author

yutkat commented Jun 2, 2022

Thanks for your quick action! 😄

@yutkat
Copy link
Author

yutkat commented Jun 2, 2022

I built Neovim with the latest luv and the segfault did not occur. Thanks for the fix 😉

@squeek502
Copy link
Member

squeek502 commented Jun 7, 2022

Looked into this a bit more because I still wasn't sure how normal usage was triggering the (what I considered to be) unlikely cases I fixed, and came away with a better understanding and more confidence that the fix was correct:

When I said in a previous comment:

which is only triggered when using the async version of a function and the initial uv_fs_* function call returns an error

I was wrong. That same branch is triggered when the sync version of the function fails, too, so any error returned from the sync version of uv_fs_scandir would cause the segfault (and this is what was happening in one of the affected neovim plugins)

For example:

local uv = require('luv')
local req, err = uv.fs_scandir('any-non-existent-path')
print(err)
uv.run()

would give (before the fix):

ENOENT: no such file or directory: any-non-existent-path
Segmentation fault (core dumped)

squeek502 added a commit to squeek502/luv that referenced this issue Jun 7, 2022
Follow up to luvit#605 and luvit#603. The sync error test added here was the actual cause of the segfaults in normal usage.
squeek502 added a commit that referenced this issue Jun 8, 2022
Follow up to #605 and #603. The sync error test added here was the actual cause of the segfaults in normal usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants