Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndk 22.9.12 #93

Merged
merged 1 commit into from
Sep 1, 2023
Merged

ndk 22.9.12 #93

merged 1 commit into from
Sep 1, 2023

Conversation

jon-nokia
Copy link
Contributor

Why I did it

Optimize the missing heartbeat message. It will be logged every 30 minutes when heartbeat is missing.
The SONIC application doesn't want to reboot the linecard when it is missing the midplane heartbeat. In case the warning messages are logged more often, modify the code to log their warning in every 30 minutes.

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202205

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@jon-nokia jon-nokia marked this pull request as ready for review August 31, 2023 20:28
@jon-nokia jon-nokia requested a review from lguohan as a code owner August 31, 2023 20:28
@judyjoseph judyjoseph self-requested a review August 31, 2023 23:14
@judyjoseph
Copy link
Contributor

judyjoseph commented Aug 31, 2023

@jon-nokia so the idea here is that after noticing and logging for the first time ( that midplane connectivity is lost ) we will wait for 30 min for the following log message.

I am referring to these syslog, Are we keeping the second log message ?

  1. syslog_msg: "Unable to reach slot $USER_SLOT (Supervisor/Lincard) via Midplane"
  2. syslog_msg: "Action set to reboot. Rebooting self"

@mlok-nokia also for comments.

@judyjoseph judyjoseph merged commit 2a86783 into Azure:master Sep 1, 2023
3 checks passed
@mlok-nokia
Copy link
Contributor

mlok-nokia commented Sep 1, 2023

@jon

@jon-nokia so the idea here is that after noticing and logging for the first time ( that midplane connectivity is lost ) we will wait for 30 min for the following log message.

I am referring to these syslog, Are we keeping the second log message ?

  1. syslog_msg: "Unable to reach slot $USER_SLOT (Supervisor/Lincard) via Midplane"
  2. syslog_msg: "Action set to reboot. Rebooting self"

@mlok-nokia also for comments.

@judyjoseph Yes. What you described above is correct. There is a second log message.
After the change of not to reboot the LC itself when heartbeat is missing for 60 seconds. A second log message which will be logged every 60 seconds on both Linecard and Supervisor. This version of NDK contains the change to log this second message every 30 minutes instead of 60 seconds.

Supervisor log: Aug 31 05:07:59.877189 ixre-cpm-chassis7 WARNING sr_device_mgr: Unable to reach slot 1 over 10.6.1.100 (missing count: 129)
Linecard log: Aug 30 01:48:35.587291 ixre-egl-board25 WARNING sr_device_mgr: Unable to reach CPM over 10.6.0.100 (missing count: 130).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants