Documentation for the existing logging system #2026

tylerjw · 2021-10-15T16:02:19Z

Background

Right now the existing logging system is not well documented. Doing this will help us better understand how to make improvements to the existing system and inform users how they should use the logging system.

This came out of the discussion in this PR: #2000

Questions this should answer

What are the RCUTILS_ macros, and where should they be used?
What are the RCLCPP_ macros, and where should they be used?
What is the logging system for the launch system and how should it be used?
Why do the RCLCPP_ macros require a logger object to be passed, while the RCUTILS_ and ROS 1 macros did not?
Where do logs go?
How does the per-process logger interact with the launch loggers?
How do I configure the loggers?
What are the environmental variables that can affect the logging system?

How should this be organized?

What are the concepts and patterns a user of ROS 2 needs to understand to use the logging system? (tutorial?)
What are the concepts and details of the logger someone contributing to ROS 2 itself needs to understand? (concepts or contributing?)

Who will do this work?

I've volunteered @JafarAbdi to work on this with me as he created the PRs that started this discussion (that I want to be merged). I can't be assigned this task as I'm not a member of the ros2 github org, but you can consider me assigned. @clalancette has agreed to review this work.

What already exists?

The text was updated successfully, but these errors were encountered:

tylerjw · 2021-10-16T23:55:51Z

Today I started tracing the code trying to figure out how to answer the question How does the per-process logger interact with the launch loggers? and I came up with a few questions that might have more obvious answers to others.

How does the screen/file output work?

How can many different processes write to the screen output (stderr/stdout) without interleaved output?

When a launch file is used does the launch system subscribe to /rosout for each process and then handle formatting/interleaving to the screen and writing to files?

Does that differ from the behavior when you use ros2 run or start the process by just executing it from a shell?

If a node is run with ros2 run how does the formatting of output and sending to the screen and log file work?

I have found https://github.com/ros2/launch/blob/master/launch/launch/logging/__init__.py which seems to create the log file directory and control formatting/writing of logfiles. Is this just for logging from within the launch system, or is this also used by processes run by the launch system?

I've discovered that some of the ros2cli calls like ros2 service call create log files with the process name python3. Is this expected, if so why?

@clalancette you can ignore this for now, these are my notes to remind myself where I left off. That is unless you know the answers to any of these and want to share.

tylerjw · 2021-10-17T22:37:20Z

On the global logging mutex. When you use an RCLCPP logging macro with the default config this is what it does:

lock global logging mutex
write to stderr/stdout (rcutils_logging_console_output_handler)
publish to rosout (rcl_logging_rosout_output_handler)
write to logfile using spdlog (rcl_logging_ext_lib_output_handler)
unlock the mutex

This seems to be a crazy pessimization to me. Why is there not at least a lock per file stream, or worse, per logging output handle? Is preserving the order or writing to logs on separate file streams that important? I understand not wanting to have multiple threads attempting to write to one file stream concurrently. Instead of using locks at this layer would it not be better to just use file locks (https://en.wikipedia.org/wiki/File_locking)?

Even better, why not have an in-memory per-thread queue for writing logs to that would be serviced (where the output handles would be called) from one thread? That way you could define a policy for how to handle this queue that would allow it to drop logs in cases where the determinism of a given thread should not be impacted by the logging system? I imagine the current intra-process communication implementation could even be used for this?

@ivanpauno looking at the commit/PR history, you are the one that added this global logging mutex. Is there something about this I'm not understanding that makes it necessary to implement locking around logging in this way?

One more thing, this mutex is a recursive_mutex. Is there a code path where this is necessary? If so, would it not be better to define the interfaces better so a recursive mutex is not needed? Reference on the origin of recursive mutex: http://www.zaval.org/resources/library/butenhof1.html

tylerjw · 2021-10-18T17:06:49Z

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber. That would eliminate the need for the mutex in the thread that is using the logging macro. Then if people are using a multi-threadded executor the logs would be written out by another thread, and that callback could use locks in various ways to synchronize access to file handles (actual files, stdout, stderr).

Another thing that could be done is there could be a separate subscription for the external library logging like spdlog from the screen logging.

Lastly, is there a reason we have a custom implementation of screen logging? Would it not be better to use an external logging library that can be configured to handle all of this? The log filtering could then be done via the external logging library and would not need to be part of the current implementation.

clalancette · 2021-10-22T14:48:34Z

This seems to be a crazy pessimization to me. Why is there not at least a lock per file stream, or worse, per logging output handle? Is preserving the order or writing to logs on separate file streams that important?

That's not important on separate streams, no. But I honestly don't think it is much of a pessimization in practice; by default, everything goes out to stderr. There are environment variables to control that, but I doubt they are heavily used.

Nevertheless, I would be on board with making this a lock per file stream.

I understand not wanting to have multiple threads attempting to write to one file stream concurrently. Instead of using locks at this layer would it not be better to just use file locks (https://en.wikipedia.org/wiki/File_locking)?

I think file locks would be hard to implement in a cross-platform way (though I'm not sure about that; someone would have to do a deeper investigation of it).

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber.

We could explore it, but it seems like a ton of overhead for something that could be called very often.

Lastly, is there a reason we have a custom implementation of screen logging? Would it not be better to use an external logging library that can be configured to handle all of this? The log filtering could then be done via the external logging library and would not need to be part of the current implementation.

I don't totally understand this part. We are using an external logging library, spdlog. Maybe we aren't using it to its full extent?

tylerjw · 2021-10-22T15:51:01Z

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber.

We could explore it, but it seems like a ton of overhead for something that could be called very often.

The thought was just that you would want to move logging overhead and locking off the thread with the macro to enable users who want to log from threads where waiting on a lock would negatively impact them. As it exists now the thread with the logging macro is already publishing to rosout (in addition to writing to a file with spdout and to stdout).

I don't totally understand this part. We are using an external logging library, spdlog. Maybe we aren't using it to its full extent?

The idea was to use an external library for the stdout output. Here is an example using spdlog for stdout:
https://github.com/gabime/spdlog#asynchronous-logger-with-multi-sinks

clalancette · 2021-10-25T14:20:06Z

The thought was just that you would want to move logging overhead and locking off the thread with the macro to enable users who want to log from threads where waiting on a lock would negatively impact them. As it exists now the thread with the logging macro is already publishing to rosout (in addition to writing to a file with spdout and to stdout).

Yeah, I'm on board with possibly doing a queue here and then having a separate thread drain that queue and actually get the data out. What seems like a lot of overhead is to also subscribe to rosout; that basically means two trips through the RMW layer (though since the publisher and subscriber are in the same process, the RMW could optimize this).

Regardless, I guess we would have to see the overhead of whatever you are proposing.

The idea was to use an external library for the stdout output. Here is an example using spdlog for stdout: https://github.com/gabime/spdlog#asynchronous-logger-with-multi-sinks

That's fair, I'd be onboard with relying more heavily on spdlog.

ivanpauno · 2021-10-25T16:29:03Z

@ivanpauno looking at the commit/PR history, you are the one that added this global logging mutex. Is there something about this I'm not understanding that makes it necessary to implement locking around logging in this way?

AFAIR, there where some crashes caused by the rcl/rcutils logging implementation not being thread safe (I think there are some globals array in rcl logging impl).
It could also be solved by adding some fine grained locking in the places where it's needed in rcl/rcutils, we would need to add some locking primitives in rcutils for that.

One more thing, this mutex is a recursive_mutex. Is there a code path where this is necessary? If so, would it not be better to define the interfaces better so a recursive mutex is not needed? Reference on the origin of recursive mutex: http://www.zaval.org/resources/library/butenhof1.html

IIRC, it was needed.
We were somewhere logging from the logger implementation itself, which caused a deadlock.
That could be solved by avoiding to call the logging macros from the logging implementation (and if we have fine grained locking instead, this probably wouldn't be needed).
I don't recall exactly though, but I remember I first tried with a normal mutex.

cadmus-to · 2022-01-21T03:40:19Z

I'm not sure if this should be a separate issue, but it is within the scope of this one.

Under Logging Usage for C++ at the Concept page (https://docs.ros.org/en/rolling/Concepts/About-Logging.html), there probably should be a C++ implementation demo similar to the Python's one. Probably something like:

https://github.com/ros2/demos/tree/master/logging_demo ?

clalancette · 2022-01-21T21:52:38Z

I'm not sure if this should be a separate issue, but it is within the scope of this one.

Under Logging Usage for C++ at the Concept page (https://docs.ros.org/en/rolling/Concepts/About-Logging.html), there probably should be a C++ implementation demo similar to the Python's one. Probably something like:

While this is a large issue (and thus could conceivably contain what you are talking about), I think it makes sense to have a separate issue for end-user demos. I'll suggest you open a separate issue just to create demos for the logging functionality; it will be easier to follow-up there for a smaller amount of work.

tylerjw · 2022-01-21T21:54:13Z

I'm sorry I haven't followed up on this. I hope to have time to clean this up soon. I agree that some user demos would be nice but are out of the scope of this PR.

tylerjw mentioned this issue Oct 15, 2021

ROS 2 Logging documentation moveit/moveit2_tutorials#151

Closed

jacobperron assigned tylerjw Oct 15, 2021

tylerjw mentioned this issue Oct 17, 2021

Draft: Document internals of logging system #2028

Closed

cadmus-to mentioned this issue Jan 23, 2022

Add example of C++ logging implementation #2239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for the existing logging system #2026

Documentation for the existing logging system #2026

tylerjw commented Oct 15, 2021

tylerjw commented Oct 16, 2021 •

edited

Loading

tylerjw commented Oct 17, 2021 •

edited

Loading

tylerjw commented Oct 18, 2021

clalancette commented Oct 22, 2021

tylerjw commented Oct 22, 2021 •

edited

Loading

clalancette commented Oct 25, 2021

ivanpauno commented Oct 25, 2021

cadmus-to commented Jan 21, 2022

clalancette commented Jan 21, 2022

tylerjw commented Jan 21, 2022

Documentation for the existing logging system #2026

Documentation for the existing logging system #2026

Comments

tylerjw commented Oct 15, 2021

Background

Questions this should answer

How should this be organized?

Who will do this work?

What already exists?

tylerjw commented Oct 16, 2021 • edited Loading

tylerjw commented Oct 17, 2021 • edited Loading

tylerjw commented Oct 18, 2021

clalancette commented Oct 22, 2021

tylerjw commented Oct 22, 2021 • edited Loading

clalancette commented Oct 25, 2021

ivanpauno commented Oct 25, 2021

cadmus-to commented Jan 21, 2022

clalancette commented Jan 21, 2022

tylerjw commented Jan 21, 2022

tylerjw commented Oct 16, 2021 •

edited

Loading

tylerjw commented Oct 17, 2021 •

edited

Loading

tylerjw commented Oct 22, 2021 •

edited

Loading