Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for the existing logging system #2026

Open
tylerjw opened this issue Oct 15, 2021 · 10 comments
Open

Documentation for the existing logging system #2026

tylerjw opened this issue Oct 15, 2021 · 10 comments
Assignees

Comments

@tylerjw
Copy link
Contributor

tylerjw commented Oct 15, 2021

Background

Right now the existing logging system is not well documented. Doing this will help us better understand how to make improvements to the existing system and inform users how they should use the logging system.

This came out of the discussion in this PR: #2000

Questions this should answer

  • What are the RCUTILS_ macros, and where should they be used?
  • What are the RCLCPP_ macros, and where should they be used?
  • What is the logging system for the launch system and how should it be used?
  • Why do the RCLCPP_ macros require a logger object to be passed, while the RCUTILS_ and ROS 1 macros did not?
  • Where do logs go?
  • How does the per-process logger interact with the launch loggers?
  • How do I configure the loggers?
  • What are the environmental variables that can affect the logging system?

How should this be organized?

  1. What are the concepts and patterns a user of ROS 2 needs to understand to use the logging system? (tutorial?)
  2. What are the concepts and details of the logger someone contributing to ROS 2 itself needs to understand? (concepts or contributing?)

Who will do this work?

I've volunteered @JafarAbdi to work on this with me as he created the PRs that started this discussion (that I want to be merged). I can't be assigned this task as I'm not a member of the ros2 github org, but you can consider me assigned. @clalancette has agreed to review this work.

What already exists?

@tylerjw
Copy link
Contributor Author

tylerjw commented Oct 16, 2021

Today I started tracing the code trying to figure out how to answer the question How does the per-process logger interact with the launch loggers? and I came up with a few questions that might have more obvious answers to others.

How does the screen/file output work?

How can many different processes write to the screen output (stderr/stdout) without interleaved output?

When a launch file is used does the launch system subscribe to /rosout for each process and then handle formatting/interleaving to the screen and writing to files?

Does that differ from the behavior when you use ros2 run or start the process by just executing it from a shell?

If a node is run with ros2 run how does the formatting of output and sending to the screen and log file work?

I have found https://github.com/ros2/launch/blob/master/launch/launch/logging/__init__.py which seems to create the log file directory and control formatting/writing of logfiles. Is this just for logging from within the launch system, or is this also used by processes run by the launch system?

I've discovered that some of the ros2cli calls like ros2 service call create log files with the process name python3. Is this expected, if so why?

@clalancette you can ignore this for now, these are my notes to remind myself where I left off. That is unless you know the answers to any of these and want to share.

@tylerjw
Copy link
Contributor Author

tylerjw commented Oct 17, 2021

On the global logging mutex. When you use an RCLCPP logging macro with the default config this is what it does:

  1. lock global logging mutex
  2. write to stderr/stdout (rcutils_logging_console_output_handler)
  3. publish to rosout (rcl_logging_rosout_output_handler)
  4. write to logfile using spdlog (rcl_logging_ext_lib_output_handler)
  5. unlock the mutex

This seems to be a crazy pessimization to me. Why is there not at least a lock per file stream, or worse, per logging output handle? Is preserving the order or writing to logs on separate file streams that important? I understand not wanting to have multiple threads attempting to write to one file stream concurrently. Instead of using locks at this layer would it not be better to just use file locks (https://en.wikipedia.org/wiki/File_locking)?

Even better, why not have an in-memory per-thread queue for writing logs to that would be serviced (where the output handles would be called) from one thread? That way you could define a policy for how to handle this queue that would allow it to drop logs in cases where the determinism of a given thread should not be impacted by the logging system? I imagine the current intra-process communication implementation could even be used for this?

@ivanpauno looking at the commit/PR history, you are the one that added this global logging mutex. Is there something about this I'm not understanding that makes it necessary to implement locking around logging in this way?

One more thing, this mutex is a recursive_mutex. Is there a code path where this is necessary? If so, would it not be better to define the interfaces better so a recursive mutex is not needed? Reference on the origin of recursive mutex: http://www.zaval.org/resources/library/butenhof1.html

@tylerjw
Copy link
Contributor Author

tylerjw commented Oct 18, 2021

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber. That would eliminate the need for the mutex in the thread that is using the logging macro. Then if people are using a multi-threadded executor the logs would be written out by another thread, and that callback could use locks in various ways to synchronize access to file handles (actual files, stdout, stderr).

Another thing that could be done is there could be a separate subscription for the external library logging like spdlog from the screen logging.

Lastly, is there a reason we have a custom implementation of screen logging? Would it not be better to use an external logging library that can be configured to handle all of this? The log filtering could then be done via the external logging library and would not need to be part of the current implementation.

@clalancette
Copy link
Contributor

This seems to be a crazy pessimization to me. Why is there not at least a lock per file stream, or worse, per logging output handle? Is preserving the order or writing to logs on separate file streams that important?

That's not important on separate streams, no. But I honestly don't think it is much of a pessimization in practice; by default, everything goes out to stderr. There are environment variables to control that, but I doubt they are heavily used.

Nevertheless, I would be on board with making this a lock per file stream.

I understand not wanting to have multiple threads attempting to write to one file stream concurrently. Instead of using locks at this layer would it not be better to just use file locks (https://en.wikipedia.org/wiki/File_locking)?

I think file locks would be hard to implement in a cross-platform way (though I'm not sure about that; someone would have to do a deeper investigation of it).

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber.

We could explore it, but it seems like a ton of overhead for something that could be called very often.

Lastly, is there a reason we have a custom implementation of screen logging? Would it not be better to use an external logging library that can be configured to handle all of this? The log filtering could then be done via the external logging library and would not need to be part of the current implementation.

I don't totally understand this part. We are using an external logging library, spdlog. Maybe we aren't using it to its full extent?

@tylerjw
Copy link
Contributor Author

tylerjw commented Oct 22, 2021

Continuing on the above topic, what do others think of changing the logging system so it uses the rosout publisher directly with the RCLCPP macros. Then if one enables screen or spdlog output the context sets up a subscriber to rosout and handles those outputs in that subscriber.

We could explore it, but it seems like a ton of overhead for something that could be called very often.

The thought was just that you would want to move logging overhead and locking off the thread with the macro to enable users who want to log from threads where waiting on a lock would negatively impact them. As it exists now the thread with the logging macro is already publishing to rosout (in addition to writing to a file with spdout and to stdout).

I don't totally understand this part. We are using an external logging library, spdlog. Maybe we aren't using it to its full extent?

The idea was to use an external library for the stdout output. Here is an example using spdlog for stdout:
https://github.com/gabime/spdlog#asynchronous-logger-with-multi-sinks

@clalancette
Copy link
Contributor

The thought was just that you would want to move logging overhead and locking off the thread with the macro to enable users who want to log from threads where waiting on a lock would negatively impact them. As it exists now the thread with the logging macro is already publishing to rosout (in addition to writing to a file with spdout and to stdout).

Yeah, I'm on board with possibly doing a queue here and then having a separate thread drain that queue and actually get the data out. What seems like a lot of overhead is to also subscribe to rosout; that basically means two trips through the RMW layer (though since the publisher and subscriber are in the same process, the RMW could optimize this).

Regardless, I guess we would have to see the overhead of whatever you are proposing.

The idea was to use an external library for the stdout output. Here is an example using spdlog for stdout: https://github.com/gabime/spdlog#asynchronous-logger-with-multi-sinks

That's fair, I'd be onboard with relying more heavily on spdlog.

@ivanpauno
Copy link
Member

@ivanpauno looking at the commit/PR history, you are the one that added this global logging mutex. Is there something about this I'm not understanding that makes it necessary to implement locking around logging in this way?

AFAIR, there where some crashes caused by the rcl/rcutils logging implementation not being thread safe (I think there are some globals array in rcl logging impl).
It could also be solved by adding some fine grained locking in the places where it's needed in rcl/rcutils, we would need to add some locking primitives in rcutils for that.

One more thing, this mutex is a recursive_mutex. Is there a code path where this is necessary? If so, would it not be better to define the interfaces better so a recursive mutex is not needed? Reference on the origin of recursive mutex: http://www.zaval.org/resources/library/butenhof1.html

IIRC, it was needed.
We were somewhere logging from the logger implementation itself, which caused a deadlock.
That could be solved by avoiding to call the logging macros from the logging implementation (and if we have fine grained locking instead, this probably wouldn't be needed).
I don't recall exactly though, but I remember I first tried with a normal mutex.

@cadmus-to
Copy link

I'm not sure if this should be a separate issue, but it is within the scope of this one.

Under Logging Usage for C++ at the Concept page (https://docs.ros.org/en/rolling/Concepts/About-Logging.html), there probably should be a C++ implementation demo similar to the Python's one. Probably something like:

@clalancette
Copy link
Contributor

I'm not sure if this should be a separate issue, but it is within the scope of this one.

Under Logging Usage for C++ at the Concept page (https://docs.ros.org/en/rolling/Concepts/About-Logging.html), there probably should be a C++ implementation demo similar to the Python's one. Probably something like:

While this is a large issue (and thus could conceivably contain what you are talking about), I think it makes sense to have a separate issue for end-user demos. I'll suggest you open a separate issue just to create demos for the logging functionality; it will be easier to follow-up there for a smaller amount of work.

@tylerjw
Copy link
Contributor Author

tylerjw commented Jan 21, 2022

I'm sorry I haven't followed up on this. I hope to have time to clean this up soon. I agree that some user demos would be nice but are out of the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants