Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(components): improve component error handling #138

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

domire8
Copy link
Member

@domire8 domire8 commented Sep 10, 2024

Description

I wouldn't say that this PR solves the parent issue but at least it removes a feature that is not fully supported, desired and thought through.

Review guidelines

Estimated Time of Review: 5 minutes

Checklist before merging:

  • Confirm that the relevant changelog(s) are up-to-date in case of any user-facing changes

@domire8
Copy link
Member Author

domire8 commented Sep 10, 2024

We could also tackle this differently: We could keep the implementation of the in_error_state predicate in the Component (since it also has that is_finished predicate) and only remove it in the LifecycleComponent where we would trigger the error state transition instead. This would then look like this

@domire8 domire8 closed this Sep 10, 2024
@domire8 domire8 reopened this Sep 10, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 10, 2024
@eeberhard
Copy link
Member

We could also tackle this differently: We could keep the implementation of the in_error_state predicate in the Component (since it also has that is_finished predicate) and only remove it in the LifecycleComponent where we would trigger the error state transition instead. This would then look like this

I like the second suggestion, but I would add some thoughts. Normal components do not really have discrete states and being "in error state" carries no meaning outside of the predicate being set to true. So, for it to be meaningful, we could also stop the step timer or skip the step callback once an error has been raised. We might also consider if it's meaningful to "reset" an error but I think that's very implementation specific and not suited for a component (rather, use a lifecycle component for that). The error predicate should just be a way for components to stop themselves, and the predicate can correspondingly be used to unload them if we want. And by stopping the step callback, it has actual value in being defined here in modulo, since it would also stop publishers.

And regarding lifecycle components, raising an error should put the component into the ERROR_PROCESSING transition state, which in turn triggers the on_error callback. This can be entered either through transition callbacks (i.e. if we raise an exception during on_activate_callback, then the wrapper on_activate method returns an error code to the lifecycle node, which is already the case) or it can be entered from step when the component is active. If you can't directly enter the ERROR_PROCESSING state from step (I haven't looked deeply yet), then you could trigger the shutdown transition but set a property to force the on_shutdown callback to return an error code.

So overall, when it comes to the necessary change, I would:

  • keep the error predicate for normal components
  • remove the error predicate from lifecycle components
  • keep the raise_error method in the base component interface for both components
  • make the raise_error method for normal components:
    • set the error predicate to true
    • stop the step callback
  • make the raise_error method for lifecycle components:
    • set an internal flag
    • trigger the shutdown transition
  • the on_shutdown wrapper for lifecycle components should return an error status if the internal error flag was set, in order to force the component into the error_processing state

@domire8
Copy link
Member Author

domire8 commented Sep 18, 2024

So I tested that with lifecycle components and what happens is that

[component_container_mt-2] [ERROR] [1726646503.924903468] [timer_1]: Failed to execute step function: test
[component_container_mt-2] [DEBUG] [1726646503.925462848] [timer_1]: on_shutdown called from previous state active.
[component_container_mt-2] [ERROR] [1726646503.925865250] [timer_1]: Entering into the error processing transition state.
[component_container_mt-2] [DEBUG] [1726646503.926206065] [timer_1]: on_error called from previous state active.

The component is in state unconfigured after that. Is that what we want or do I also have to explicitly fail the on_error callback if an error has been raised?

@domire8
Copy link
Member Author

domire8 commented Sep 18, 2024

In general, subscriptions will survive these steps, especially in a Component. Do we want to clear those as well?

@eeberhard
Copy link
Member

The component is in state unconfigured after that. Is that what we want or do I also have to explicitly fail the on_error callback if an error has been raised?

That's what we want, yes. The on_error_callback for the derived component to override determines the error recovery behavior. If that method returns true, it means the error is handled and the component goes into the unconfigured state (however, the implementation is responsible for correctly resetting internal properties to unconfigured). If the error callback returns false, it should go straight into finalized state.

@eeberhard
Copy link
Member

eeberhard commented Sep 18, 2024

Actually the default virtual on_error_callback in the base class should return false. Only if it is overridden to implement actual error recovery should it return true

@domire8
Copy link
Member Author

domire8 commented Sep 18, 2024

however, the implementation is responsible for correctly resetting internal properties to unconfigured

That is quite important for us as well though, we need to carefully see what we should reset and what not.

if (!this->handle_shutdown()) {
break;
}
// TODO: reset and finalize all needed properties
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're already here we do also need to think about that quickly

@domire8 domire8 changed the title refactor(components): remove unused error predicate and raise_error refactor(components): improve component error handling Sep 18, 2024
@@ -13,7 +13,8 @@ LifecycleComponent::LifecycleComponent(const rclcpp::NodeOptions& node_options,
LifecycleNode::get_node_parameters_interface(), LifecycleNode::get_node_services_interface(),
LifecycleNode::get_node_time_source_interface(), LifecycleNode::get_node_timers_interface(),
LifecycleNode::get_node_topics_interface(), LifecycleNode::get_node_type_descriptions_interface(),
LifecycleNode::get_node_waitables_interface())) {}
LifecycleNode::get_node_waitables_interface())),
has_error_(true) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
has_error_(true) {}
has_error_(false) {}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants