-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: allow subscription message pointer to be changed between callbacks #186
Comments
Sounds very interesting to me. Sure, go ahead.
|
Passing the queue on subscription declaration would need someone to establish a convention for accessing a queue (a wrapper of platform-agnostic function calls) , and then the queue accessor would need to be called during If the scope is limited to a function call that exchanges a In practice, the user-side code isn't too bad.
I'm undecided if it should search by message pointer or by rcl_handle. In cases with very large numbers of handles, it would make sense to skip this step and allow the executor to advance the queue, but that starts to look like a whole separate polling API I'd actually like to see a reduction in the number of |
okay, understood. Then go ahead with the option that limits the scope to exchange the |
I haven't had any need for more than four or five handles on my esp32 executors personally, but I'm about to start using more guard-conditions to synchronise code. If search performance becomes important, we can allow the user to cache the results:
This will only search more than once if an earlier handle is removed, and will accept NULL as an always-search option for stateless callbacks. Searching based on message pointer should allow multiple context-free subscriptions to use the same callback and still rotate separate message queues, I'll try that first. |
Looking at other ways of doing this: I might get better performance by using or almost the same performance by passing messages with zeroed variable-sized part and passing a specialised allocator which manages the queues Only message swapping and custom allocators will allow a borrow concept to be combined with device-specific allocation, and even then, only down-stream of the rmw (the ESP32 has multiple RAM devices with wildly different speeds) |
Regarding guard-conditions: How do you want to use guard-conditions? Is timing critical? Note, that a triggered guard-condition does not cause @pablogs9 How much effort would it be to change the behavior of |
Thanks for the heads up! That will save me at least a day of debugging. |
I have a concern about data corruption:
How do you solve this? Did I misunderstood your approach? A similar approach I have implemented in the "budget-enabled real-time executor", in which every subscription callback is processed in a different thread and in one specific Executor thread accesses to the middleware are managed (aka Publication: https://arxiv.org/abs/2105.05590 |
Your concerns about data corruption are exactly why I chose to build this feature. I'm using FreeRTOS queues to ensure thread safety of the message pool itself. I have a Queue of unused messages, and a Queue of messages ready for processing. The queues only hold pointers to the messages. The application starts with one pre-allocated message (M1) held by the executor, handle->data is the only instance of that message pointer anywhere in the application. The only way to access that message is to be passed the message pointer during subscription callback. During callback: a new (pre-allocated) message is borrowed from the pool M2, handle->data is replaced by M2; M1 which was passed in as an argument is given to the message queue to wait for processing. The only copy of the M1 pointer is in the processing queue, the only copy of the M2 pointer is in the executor. The next time spin_some is called, the executor passes M2 to rcl_take to write into. No other thread has written to M2 since the callback because M2 does not exist anywhere else, and the executor has not written to M1 because M1 only exists in the queue or downstream from it. The only times it is safe to use the message swap feature is either during the callback that uses the message being swapped, or while spin-some is not running, or callbacks that execute later than the handle that holds the message being swapped. This is mentioned in the PR docstrings. I'll have a go at an example, but it'll be a bit simplistic on posix. |
Great, thanks for the explanation. Changing the pointer of In the paper about the multi-threaded Executor, I intentionally did not use a queue, which is indeed a typical way to organize multi-threading, because a user can define quality of service (QoS) in DDS. (e.g. buffer sizes=1). If the executor has now it's own queue to store new messages, then the QoS parameter will be shadowed. Also you need to manually check the length of the queue depending on the execution time of the "real subscription callback - that does sth with the data" and the topic/timer frequency. The queue-size might therefore change, e.g. when you port the application to a different hardware. Then the queue size would need to be adapted e.g. when there is a software update and the topic is published with higher frequency, or the algorithm is more complex, e.g. execution time of the message callback is longer ... Then messages will be missed. - Or you need a blocking wait in the subscription callback - that returns after a free message in the queue became available. ... This was the reason, why in the general concept of the multi-threaded executor, I decided against an Executor internal queue and use the feedback mechanism, that only fetches new data, if a corresponding processing thread is READY to process new messages. But I agree with you, avoiding copying data on micro-controller should be avoided as much as possible. As this is not necessary and only costs time. Maybe QoS is not so much an issue for you, so by choosing the queue size wisely will work fine for your use-case. If you like, you are very much invited to present this new functionality and possibly an evaluation of the performance improvement in an Embedded Working Group Meeting later on. |
I don't know your application, but: what about using multiple executors. Each one in its own thread. So that each received subscription message is processed in a different thread. The message is not copied, if you process it in the subscription callback. Would that help you? |
I'm trying to write an esp-adf (audio development framework) component that takes messages from my ros-gst-bridge package. I'm currently able to stream 16b 48KHz stereo audio through RCL publish, with the publisher and executor in the two lowest priority threads. I'm introducing queues and threads to make sure downstream elements stay fed and synchronised even if I have to inject a block of zeroes. The high priority threads are performing time-stamping so they need to idle as much as possible. Any call to RCL (and wifi) has a call duration that is much too variable to be allowed in the fast loop. My latency is dominated by transport jitter, and the process queues are mostly empty unless the wifi chokes. In ros-gst-bridge, latency is configured for a whole pipeline and messages have a presentation-time-stamp. In esp-adf, every element in the pipeline has a thread and a ring buffer. With this much glue code, DDS QoS is just one of many dials I need to adjust, but excessive buffers aren't a problem. rclc assumes I can collect the message in a callback in its thread, esp-adf assumes I can return the message contents during a time-sensitive callback in its thread. The two design patterns require a thread-safe queue between them. |
Hi @BrettRD, I got a rough understanding of your application use-case:
That is, in your application you want to pass the pointer of the received message from one thread to another thread in the audio processing pipeline. While the rclc-executor assumes a single-threaded processing of the subscription callback. Consequently, after the subscription callback has finished, Essentially, in your proposed approach you manage your own message pool by swapping the message in Which makes sense for me and is a reasonable solution given the current implmentation of the rclc-Executor. How about a message pool inside the rclc-Executor:
@BrettRD would that be a solution or does adding to the queue etc need some additional stuff ? @pablogs9 With rcl_take_loaned_message we could easily implement this feature, without any further memory pool directly in the rclc executor. What do you think, how much effort is it to implement these functions for micro-ROS? |
A borrow API from the middle-ware memory pool would be ideal, especially if I can specify the memory device the whole rmw pool is allocated into. A release-message call, and a way of making the subscription or the whole executor operate in borrow mode would be the only rclc API changes necessary. (And I think the release-message call is already provided by RCL) My pool has a borrow and return method, and an upstream queue for available messages. those would all disappear, leaving me with a vanilla freertos queue object and no user-side allocations. How would you manage the case of memory exhaustion when all messages are on loan? |
That's always the great question :) In this case, I would only take new message from DDS with |
I'm having a hard time getting micro-ros to go fast enough for my application.
I've implemented a queue of pre-allocated ros messages for the publisher to send on a separate thread to my data acquisition loop, and the API supports that fine, especially with the multi-threaded rmw (thanks pablogs9!)
It would be nice for
_rclc_take_new_data()
/rcl_take()
to write directly into my queue to avoid copying a large buffer during the callback.Changing
handle->data
is always safe inside that handle's subscription callback as long as the new message is valid and initialised.This feature involves a new call to the executor that changes
handle->data
perhaps leveraging_rclc_executor_find_handle()
and sanity-checkinghandle->type
.I can get by with
#include "rclc/executor.h"
and hacking around directly, but I imagine other people will want a similar performance boost.I'm happy to implement and test this, is there anything I'm missing? and should similar be written for the other handle types?
The text was updated successfully, but these errors were encountered: