Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using recv with small buffer can crash Urbit #490

Closed
guaraqe opened this issue Jul 10, 2023 · 12 comments · Fixed by #494
Closed

Using recv with small buffer can crash Urbit #490

guaraqe opened this issue Jul 10, 2023 · 12 comments · Fixed by #494
Assignees
Labels
bug Something isn't working

Comments

@guaraqe
Copy link

guaraqe commented Jul 10, 2023

It is possible to crash a Urbit by taiking to its socket (such as in click), but reading too few bytes. To test, you can run a command with click, but change the script so the size of the buffer in nc is something very small, such as 2.

This has affected hosting, when reading the vats of ships with too many apps installed. This is under control already, but it would be nice to solve.

@lukebuehler

@matthew-levan
Copy link
Contributor

matthew-levan commented Jul 12, 2023

I'm having trouble reproducing this on my Ubuntu machine with the instructions you provided here; would you mind posting a more detailed reproduction? Which environment are you seeing this, and have you changed the ships' states? I will continue tinkering in the meantime.

@ashelkovnykov
Copy link
Contributor

@matthew-levan Issue should probably be moved to here, since it's not an issue with khan.hoon or conn.c, but with the click script

@guaraqe
Copy link
Author

guaraqe commented Jul 18, 2023

@ashelkovnykov We observed it in production, without using click at all, so this is not related to the script itself. I am writing a reproducer.

@guaraqe
Copy link
Author

guaraqe commented Jul 18, 2023

@matthew-levan I added a reproducer to the https://github.com/guaraqe/urbit-benchmark repo, in this commit: guaraqe/urbit-benchmark@499ef77

I couldn't reproduce it with netcat, so I wrote a simple clone with Python (nc.py) , and got the correct result. If you run ./code-click-python ../salsyp-samzod you should get the expected result (the Urbit's code), but if you change SIZE in the Python script from 1024 to 1, the Urbit running at ../salsyp-samzod should break with many messages like:

~salsyp_samzod:dojo> newt: write failed broken pipe
conn: moor bail -32 broken pipe
~salsyp_samzod:dojo> newt: write failed broken pipe
conn: moor bail -32 broken pipe
~salsyp_samzod:dojo> newt: write failed broken pipe

@ashelkovnykov
Copy link
Contributor

We observed it in production, without using click at all, so this is not related to the script itself. I am writing a reproducer.

@guaraqe Right, my apologies - misread

@joemfb
Copy link
Member

joemfb commented Jul 18, 2023

@guaraqe the expected behavior on early disconnection is that those errors are printed and the socket is closed, but the ship should stay up and not crash. Is that what you're seeing?

The newt wire-framing includes a 5 byte header: one tag byte (0) and 4 bytes of little-endian length. The right pattern for consuming it is to read 5 bytes, then read the length specified, then read 5 bytes, &c.

@guaraqe
Copy link
Author

guaraqe commented Jul 18, 2023

The unexpected behavior is the ship crashing.

@matthew-levan
Copy link
Contributor

matthew-levan commented Jul 18, 2023

Hello, I just attempted to reproduce this using your urbit-benchmark repository, but was unable to do so. After changing SIZE to 1 and running ./click-code-python /path/to/fakezod, this is what printed in the dojo:

matt@mbp14 vere % ./urbit zod
~
urbit 2.11-3c8c0b219d
conn: fyrd 0v0
clay: read-at-aeon fail [desk=%base care=%b case=[%da p=~2023.7.18..18.09.11..96df] path=/ted-eval]
clay: no files match /mar/ted-eval/hoon
[%error-building-mark %ted-eval]
[%error-building-dais %ted-eval]
conn: bail 1

[%poke %crud]
bar-stack=[i=[i=//khan/0v5.a91gg/1/0v0 t=~] t=~]
call: failed
/sys/vane/khan/hoon:<[128 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[132 5].[132 41]>
%khan-call-dud
/sys/vane/khan/hoon:<[132 23].[132 40]>
/sys/vane/khan/hoon:<[78 26].[78 28]>
[%mark-invalid %ted-eval]
/sys/vane/khan/hoon:<[78 5].[78 29]>
/sys/vane/khan/hoon:<[77 3].[80 26]>
/sys/vane/khan/hoon:<[75 3].[80 26]>
/sys/vane/khan/hoon:<[74 3].[80 26]>
/sys/vane/khan/hoon:<[143 21].[143 53]>
/sys/vane/khan/hoon:<[143 5].[147 51]>
/sys/vane/khan/hoon:<[142 5].[147 51]>
/sys/vane/khan/hoon:<[141 5].[147 51]>
/sys/vane/khan/hoon:<[140 5].[147 51]>
/sys/vane/khan/hoon:<[133 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[128 3].[148 5]>
call: failed
bar-stack=[i=[i=//khan/0v5.a91gg/1/0v0 t=~] t=~]
[%poke %fyrd]
conn: bail: %exit
conn: bail 2

[%poke %fyrd]
bar-stack=[i=[i=//khan/0v5.a91gg/1/0v0 t=~] t=~]
call: failed
/sys/vane/khan/hoon:<[128 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[133 3].[148 5]>
/sys/vane/khan/hoon:<[140 5].[147 51]>
/sys/vane/khan/hoon:<[141 5].[147 51]>
/sys/vane/khan/hoon:<[142 5].[147 51]>
/sys/vane/khan/hoon:<[143 5].[147 51]>
/sys/vane/khan/hoon:<[143 21].[143 53]>
/sys/vane/khan/hoon:<[74 3].[80 26]>
/sys/vane/khan/hoon:<[75 3].[80 26]>
/sys/vane/khan/hoon:<[77 3].[80 26]>
/sys/vane/khan/hoon:<[78 5].[78 29]>
[%mark-invalid %ted-eval]
/sys/vane/khan/hoon:<[78 26].[78 28]>
conn: bail: %exit
conn: %fyrd event on /khan/0v5.a91gg/1/0v0 failed

conn: fyrd 0v0
clay: read-at-aeon fail [desk=%base care=%b case=[%da p=~2023.7.18..18.10.05..b91d] path=/ted-eval]
clay: no files match /mar/ted-eval/hoon
[%error-building-mark %ted-eval]
[%error-building-dais %ted-eval]
conn: bail 1

[%poke %crud]
bar-stack=[i=[i=//khan/0v5.a91gg/2/0v0 t=~] t=~]
call: failed
/sys/vane/khan/hoon:<[128 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[132 5].[132 41]>
%khan-call-dud
/sys/vane/khan/hoon:<[132 23].[132 40]>
/sys/vane/khan/hoon:<[78 26].[78 28]>
[%mark-invalid %ted-eval]
/sys/vane/khan/hoon:<[78 5].[78 29]>
/sys/vane/khan/hoon:<[77 3].[80 26]>
/sys/vane/khan/hoon:<[75 3].[80 26]>
/sys/vane/khan/hoon:<[74 3].[80 26]>
/sys/vane/khan/hoon:<[143 21].[143 53]>
/sys/vane/khan/hoon:<[143 5].[147 51]>
/sys/vane/khan/hoon:<[142 5].[147 51]>
/sys/vane/khan/hoon:<[141 5].[147 51]>
/sys/vane/khan/hoon:<[140 5].[147 51]>
/sys/vane/khan/hoon:<[133 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[128 3].[148 5]>
call: failed
bar-stack=[i=[i=//khan/0v5.a91gg/2/0v0 t=~] t=~]
[%poke %fyrd]
conn: bail: %exit
conn: bail 2

[%poke %fyrd]
bar-stack=[i=[i=//khan/0v5.a91gg/2/0v0 t=~] t=~]
call: failed
/sys/vane/khan/hoon:<[128 3].[148 5]>
/sys/vane/khan/hoon:<[130 3].[148 5]>
/sys/vane/khan/hoon:<[131 3].[148 5]>
/sys/vane/khan/hoon:<[133 3].[148 5]>
/sys/vane/khan/hoon:<[140 5].[147 51]>
/sys/vane/khan/hoon:<[141 5].[147 51]>
/sys/vane/khan/hoon:<[142 5].[147 51]>
/sys/vane/khan/hoon:<[143 5].[147 51]>
/sys/vane/khan/hoon:<[143 21].[143 53]>
/sys/vane/khan/hoon:<[74 3].[80 26]>
/sys/vane/khan/hoon:<[75 3].[80 26]>
/sys/vane/khan/hoon:<[77 3].[80 26]>
/sys/vane/khan/hoon:<[78 5].[78 29]>
[%mark-invalid %ted-eval]
/sys/vane/khan/hoon:<[78 26].[78 28]>
conn: bail: %exit
conn: %fyrd event on /khan/0v5.a91gg/2/0v0 failed

> 
> 
> 
~zod:dojo> 

It did not crash.

Here is the output from executing the Python script:

matt@mbp14 urbit-benchmark % ./code-click-python ~/src/urbit/vere/zod
loom: mapped 2048MBloom: mapped 2048MB

lite: arvo formula 2a2274c9
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: core 4bb376f0
lite: final state 4bb376f0
eval (cue, newt):
lite: final state 4bb376f0
eval (jam, newt):
cue failed

@matthew-levan
Copy link
Contributor

Which environment are you testing this? I tried this on my macos-aarch64 machine.

@guaraqe
Copy link
Author

guaraqe commented Jul 18, 2023

I tried on Linux, and we observed that in GCloud Linux VMs too.

@matthew-levan
Copy link
Contributor

matthew-levan commented Jul 19, 2023

Thanks; I can confirm reproduction on my linux-x86_64 box:

matthew@domus:~/src/guaraqe/urbit-benchmark$ ./code-click-python ~/ships/dev/zod
loom: mapped 2048MB
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
eval (jam, newt):
lite: core 4bb376f0
lite: final state 4bb376f0
eval (cue, newt):

cue failed
conn: moor bail -32 broken pipe
~zod:dojo> newt: write failed broken pipe
...
conn: moor bail -32 broken pipe
~zod:dojo> newt: write failed broken pipe
Segmentation fault (core dumped)

@matthew-levan
Copy link
Contributor

See #494.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants