Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndk 22.9.21 #195

Merged
merged 1 commit into from
Jan 8, 2024
Merged

ndk 22.9.21 #195

merged 1 commit into from
Jan 8, 2024

Conversation

jon-nokia
Copy link
Contributor

@jon-nokia jon-nokia commented Dec 20, 2023

Why I did it

  1. Modify code on the Linecard to detect if Supervisor is ungraceful reboot or removal, If BDB hardware checking is missing 3 times, Linecard will reboot itself.
  2. Add function to handle the GRPC call to disable all SFPs which provide the capability to the linecard to disable all SFP when linecard is reboot.
  3. Added delay to start the sr_dev_mgr on Supervisor to make sure the linecard shutdown all SFPs on the down path when both linecards and Supervisor are rebooting.
  4. Better handling of SFP module read failure and ensure minimum time in reset on SW reset request
  5. Added thermal logging improvement.
  6. Modified the Supervisor to call "sudo reboot" when the reboot request is from PMON API
  7. Fix the reboot-cause history issue which is caused by the change of determine-reboot-cause.service
  8. Fix all-modules shutdown

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@jon-nokia jon-nokia requested a review from lguohan as a code owner December 20, 2023 15:34
@gechiang
Copy link
Contributor

@jon-nokia Is there any dependency of this new NDK drop? Any other PR also needs to be picked up or this is a standalone PR?

@jon-nokia
Copy link
Contributor Author

@jon-nokia Is there any dependency of this new NDK drop? Any other PR also needs to be picked up or this is a standalone PR?

It should be merged with sonic-net/sonic-buildimage#17378

@gechiang
Copy link
Contributor

@jon-nokia Is there any dependency of this new NDK drop? Any other PR also needs to be picked up or this is a standalone PR?

It should be merged with sonic-net/sonic-buildimage#17378

Thanks @jon-nokia !
knowing that it is not possible to merge both at the same time. which one should be merged first to avoid any breakage/regression whether it be build time or run time?

@jon-nokia jon-nokia marked this pull request as draft January 4, 2024 19:03
@jon-nokia jon-nokia changed the title ndk 22.9.19 ndk 22.9.20 Jan 4, 2024
@judyjoseph
Copy link
Contributor

@jon-nokia @mlok-nokia could you update the PR summary, as we have diverged a lot from the original design.
Also please update when it is ready for review/merge

@jon-nokia jon-nokia mentioned this pull request Jan 5, 2024
5 tasks
@jon-nokia jon-nokia marked this pull request as ready for review January 5, 2024 15:10
@jon-nokia
Copy link
Contributor Author

@jon-nokia @mlok-nokia could you update the PR summary, as we have diverged a lot from the original design. Also please update when it is ready for review/merge

Hi @judyjoseph - PR summary has been updated and these are ready to be integrated. Thanks.

@mlok-nokia
Copy link
Contributor

@jon-nokia @mlok-nokia could you update the PR summary, as we have diverged a lot from the original design. Also please update when it is ready for review/merge

Hi @judyjoseph - PR summary has been updated and these are ready to be integrated. Thanks.

This PR should be merged with sonic-net/sonic-buildimage#17483.

@judyjoseph
Copy link
Contributor

@mlok-nokia "If BDB hardware checking is missing 3 times" -- what is the interval in which we check this.
So this is in addition to the 4 times check for heartbeat ( with interval of 10 sec ) ?

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Jan 5, 2024

@mlok-nokia "If BDB hardware checking is missing 3 times" -- what is the interval in which we check this. So this is in addition to the 4 times check for heartbeat ( with interval of 10 sec
The interval is 10 seconds. The heartbeat checking and BDB check happens at the same time. When heartbeat is missing, we do the BDB checking. If Heartbeat and BDB checking failed 3 times consecutively, linecard will reboot itself.

@judyjoseph
Copy link
Contributor

Is the default log level DEBUG ? @mlok-nokia - Changing to DEBUG in this commit https://github.com/Nokia-ION/ndk/commit/c074eef1f9f0281328ab8050fd6527014ff4fa55 ( line num: 258) will it be printable data ?

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Jan 6, 2024

Is the default log level DEBUG ? @mlok-nokia - Changing to DEBUG in this commit Nokia-ION/ndk@c074eef ( line num: 258) will it be printable data ?

The log level is info. I think it is printable data, but it could be encoded. Let me add @snider-nokia to this message

@mlok-nokia
Copy link
Contributor

@jon-nokia Is there any dependency of this new NDK drop? Any other PR also needs to be picked up or this is a standalone PR?

It should be merged with sonic-net/sonic-buildimage#17378

Thanks @jon-nokia ! knowing that it is not possible to merge both at the same time. which one should be merged first to avoid any breakage/regression whether it be build time or run time?

In order to functioning correctly, both are need to build in the same image. Either one is merged first won't break the build

@snider-nokia
Copy link
Contributor

Is the default log level DEBUG ? @mlok-nokia - Changing to DEBUG in this commit Nokia-ION/ndk@c074eef ( line num: 258) will it be printable data ?

The log level is info. I think it is printable data, but it could be encoded. Let me add @snider-nokia to this message

Log level is automatically defaulted to INFO level during NDK bringup. After that time, log level DEBUG output will only be seen if NDK logging level has been subsequently raised dynamically (by operator intervention) to DEBUG level. When line 258 is hit (if logging is at DEBUG level) then the data will be printed as a string of hex bytes.

@snider-nokia
Copy link
Contributor

snider-nokia commented Jan 6, 2024

@judyjoseph, With NDK version 22.9.20 we have discerned a problem in test whereby oper-up 400G interfaces are not correctly shut down at LC reboot time on the way down (that is currently not occurring until LC is on the way back up again). 100G interfaces do go down timely when LC is on the way down for reboot.

We have corrected this issue and will begin OC testing with this fix this evening (build just completed and tests will be fired up shortly). You are thus going to need NDK 22.9.21 in order to have this functionality corrected (400G interfaces forced down timely at LC reboot time).

We will generate version 22.9.21, and update this PR and #196 with said, as soon as we have verified in our testbed the above referenced behavior is corrected.

@jon-nokia jon-nokia changed the title ndk 22.9.20 ndk 22.9.21 Jan 6, 2024
@mlok-nokia
Copy link
Contributor

@judyjoseph We have verified the NDK 22.9.21 works fine and this PR is ready for merge. Thanks.

@judyjoseph judyjoseph merged commit c120bf2 into Azure:master Jan 8, 2024
3 checks passed
@rlhui
Copy link
Contributor

rlhui commented Jan 9, 2024

also tagging @deepak-singhal0408 on this one.

jon-nokia pushed a commit to jon-nokia/sonic-buildimage-msft that referenced this pull request May 3, 2024
…lly (#18491)

#### Why I did it
src/sonic-gnmi
```
* ad8850d - (HEAD -> master, origin/master, origin/HEAD) Merge pull request Azure#195 from liuh-80/dev/liuh/zmq_dpu_support (2 hours ago) [Hua Liu]
* 9b2dcdb - Improve code coverage (2 days ago) [liuh-80]
* a9e52de - Fix code issue (2 days ago) [liuh-80]
* c2d594e - Handle remove ZMQ client error (2 days ago) [liuh-80]
* 70bb2ac - Merge remote-tracking branch 'origin' into dev/liuh/zmq_dpu_support (2 days ago) [liuh-80]
* 1adbee3 - Remove client when retry connect failed (3 days ago) [liuh-80]
* 359459b - Update mixed_db_client.go (7 days ago) [Hua Liu]
* 70352fa - Get DPU address by connecter (7 days ago) [liuh-80]
* 254a29d - Fix UT (8 days ago) [liuh-80]
* c1186ce - Merge with latest code (8 days ago) [liuh-80]
* e4c0649 - Fix space/tab issue (8 days ago) [liuh-80]
* e7fc8fc - Merge remote-tracking branch 'origin' into dev/liuh/get_dpu_address (9 days ago) [liuh-80]
* 0993451 - Add DbSubscriber (9 days ago) [liuh-80]
* ea5f91d - Improve code to use ConfigDBConnector (2 weeks ago) [liuh-80]
* b978e4d - Merge branch 'master' into dev/liuh/get_dpu_address (2 weeks ago) [Qi Luo]
* 654934f - Fix UT (3 weeks ago) [liuh-80]
* cb3d12f - Fix UT (3 weeks ago) [liuh-80]
* 1e2132b - Improve code coverage and fix PR comments (3 weeks ago) [liuh-80]
* a34d2b5 - Fix code (4 weeks ago) [liuh-80]
* 32ab774 - Fix build issue (4 weeks ago) [liuh-80]
* 6862050 - Add getZmqAddress method (4 weeks ago) [liuh-80]
* 17d7c4a - Fix PR comments (4 weeks ago) [liuh-80]
* 39ae8fe - Fix PR comments (4 weeks ago) [liuh-80]
* 628ce20 - Improve code (4 weeks ago) [liuh-80]
* 5f63e1c - Add getDpuAddress method to supprot multiple DPU (4 weeks ago) [liuh-80]
* 44a8071 - (origin/dev/liuh/get_dpu_address) Add getDpuAddress method (4 weeks ago) [liuh-80]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants