[Bug] Failed to obtain RX buffer #595

alexzeit · 2024-08-15T12:49:25Z

Describe the bug

In case of restart of zenoh-pico mcu sporadic got <"err> eth_stm32_hal: Failed to obtain RX buffer" and subscriber stopped receiving the messages.
This happens only if publisher e.p python is already running or have been started before the mcu with zenoh-pico have been started. An additional delay (e.g. 10s) before z_declare_subscriber does not help.
Used peer connection.

To reproduce

configure python publisher and subscriber peer mode and interval=2ms
configure zenoh-pico peer mode subscriber and publisher interval=1ms
start python publisher and subscriber
start zenoh-pico publisher and subscriber
restart mcu with zenoh-pico

System info

zenoh-pico v 1.0.0.6 is running on Nucleo 144 H753ZI (Cortex-M7 480MHz)
zephyr v3.7.0.0 SDK v0.16.8
zenoh python v 1.0.0.6 is running on ubuntu 22.04 on NXP RDB3 Eval board

jean-roland · 2024-08-19T08:24:44Z

Thanks for the report @alexzeit, will try to reproduce on my F767ZI.

jean-roland · 2024-08-27T13:37:22Z

Just to clarify you have a Python publisher sending a message every 2ms to a pico subscriber
And a pico publisher sending a message every 1ms to a Python subscriber
All in peer mode.
Do you have two boards or the publisher and subscriber are on the same nucleo?

Actually, it would probably be easier if you could send me the project files you used.

alexzeit · 2024-08-27T16:42:30Z

Hi Jean-Roland
yes, but the same behaviour I have observed with c++ pubsub and 1ms in peer mode. We have one boards where publisher and subscriber are running in separate threads of zephyr rtos.

jean-roland · 2024-08-30T08:27:51Z

Alright, so it seems the error message is produced by Zephyr when it ran out of RX buffers to store messages.

My guess is it breaks the connection and since we do not yet have connectivity event support (see Issue #333) the only possibility is to restart the node.

Alternatively, you can try increasing the number of RX buffers, that should reduce the occurrence rate, see https://docs.zephyrproject.org/2.7.5/reference/kconfig/CONFIG_NET_BUF_RX_COUNT.html and https://docs.zephyrproject.org/2.7.5/reference/kconfig/CONFIG_NET_PKT_RX_COUNT.html

That also means pico has a hard time keeping up with this message rate, and as we discussed before we're going to look into performance after the 1.0 release.

alexzeit · 2024-08-31T05:18:30Z

Yes, it seems to be by zephyr, but this is caused by zenoh core. I think the issue is that zenoh starts the Ethernet receiver but it takes time until it starts to consume the bytes from eth Rx buffer. Because in other case, where the python publisher is not running during zenoh start up, this issue is not happening. I have tried to increase the rx buffer, but this did not solve the problem

jean-roland · 2024-09-03T09:56:11Z

So I tried reproducing the issue on my board with a pub/sub with 1ms frequency without success (or failure?). Is it possible for you to send me the files you used for the board and PC?

alexzeit added the bug Something isn't working label Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Failed to obtain RX buffer #595

[Bug] Failed to obtain RX buffer #595

alexzeit commented Aug 15, 2024

jean-roland commented Aug 19, 2024

jean-roland commented Aug 27, 2024 •

edited

Loading

alexzeit commented Aug 27, 2024

jean-roland commented Aug 30, 2024 •

edited

Loading

alexzeit commented Aug 31, 2024 •

edited

Loading

jean-roland commented Sep 3, 2024

[Bug] Failed to obtain RX buffer #595

[Bug] Failed to obtain RX buffer #595

Comments

alexzeit commented Aug 15, 2024

Describe the bug

To reproduce

System info

jean-roland commented Aug 19, 2024

jean-roland commented Aug 27, 2024 • edited Loading

alexzeit commented Aug 27, 2024

jean-roland commented Aug 30, 2024 • edited Loading

alexzeit commented Aug 31, 2024 • edited Loading

jean-roland commented Sep 3, 2024

jean-roland commented Aug 27, 2024 •

edited

Loading

jean-roland commented Aug 30, 2024 •

edited

Loading

alexzeit commented Aug 31, 2024 •

edited

Loading