Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SmartSwitch support in chassisd and enabling chassisd #467

Open
wants to merge 193 commits into
base: master
Choose a base branch
from

Conversation

rameshraghupathy
Copy link

@rameshraghupathy rameshraghupathy commented Apr 15, 2024

Added SmartSwitch support in chassisd and enabling chassisd for fixed SmartSwitches

Description

chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. chassisd will be enabled only on the smartswitch NPU and hence the scope of these changes are limited only to NPU and not applicable to DPU.

Motivation and Context

chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. Hence, chassid is needed for SmartSwitch and enabled here. Also, some of the table updates and clean up are not required for SmartSwitch platform and hence using the is_smartswitch API to selectively enable it. the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here

How Has This Been Tested?

  1. Enabled and built and image and checked if the chassisd is running all the time without crashing
  2. Verify if the config change handler is working as expected by issuing config CLIs to startup and shutdown DPUs

Additional Information (Optional)

@oleksandrivantsiv
Copy link
Collaborator

What are the states supported by the DPUs in the Smart Switch?

@rameshraghupathy
Copy link
Author

What are the states supported by the DPUs in the Smart Switch?

”dpu_midplane_link_state”
”dpu_control_plane_state"
"dpu_data_plane_state"

sonic-chassisd/scripts/chassisd Outdated Show resolved Hide resolved
sonic-chassisd/scripts/chassisd Outdated Show resolved Hide resolved
accidental removal of a few lines from the original chassisd file
Copy link
Collaborator

@oleksandrivantsiv oleksandrivantsiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as commented

@gpunathilell
Copy link
Contributor

gpunathilell commented Nov 23, 2024

The REBOOT_CAUSE_DIR is not accessible from the pmon docker where chassisd is running, (theres a diffierent /host/reboot created in the docker )REBOOT_CAUSE_DIR = "/host/reboot-cause/module/" The first-boot file which is created by sonic-host-services is not seen by chassisd, we need to use a different location which is available in the docker, and is persistent across reboots
Resolved after considering the buildimage PR

@rameshraghupathy
Copy link
Author

rameshraghupathy commented Nov 23, 2024

The REBOOT_CAUSE_DIR is not accessible from the pmon docker where chassisd is running, (theres a diffierent /host/reboot created in the docker )REBOOT_CAUSE_DIR = "/host/reboot-cause/module/" The first-boot file which is created by sonic-host-services is not seen by chassisd, we need to use a different location which is available in the docker, and is persistent across reboots

@gpunathilell Have you used the PR? This is needed for pmon docker to mount the volume.
image

@gpunathilell
Copy link
Contributor

The REBOOT_CAUSE_DIR is not accessible from the pmon docker where chassisd is running, (theres a diffierent /host/reboot created in the docker )REBOOT_CAUSE_DIR = "/host/reboot-cause/module/" The first-boot file which is created by sonic-host-services is not seen by chassisd, we need to use a different location which is available in the docker, and is persistent across reboots

@gpunathilell Have you used the PR? This is needed for pmon docker to mount the volume.
image

No, I see that the location is mounted in pmon in that PR, will include, please ignore comment

updates = {
"dpu_midplane_link_state": state,
"dpu_midplane_link_reason": "",
"dpu_midplane_link_time": datetime.now().strftime("%Y%m%d %H:%M:%S"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rameshraghupathy I mean this line

Suggested change
"dpu_midplane_link_time": datetime.now().strftime("%Y%m%d %H:%M:%S"),
"dpu_midplane_link_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),

Will be displayed as such when we do show system-health dpu:
DPU2 Online dpu_midplane_link_state up 2024-11-25 23:38:21

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -92,6 +96,8 @@ INVALID_IP = '0.0.0.0'
CHASSIS_MODULE_ADMIN_STATUS = 'admin_status'
MODULE_ADMIN_DOWN = 0
MODULE_ADMIN_UP = 1
REBOOT_CAUSE_DIR = "/host/reboot-cause/module/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rameshraghupathy rename to MODULE_REBOOT_CAUSE_DIR ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor Done

Copy link
Collaborator

@oleksandrivantsiv oleksandrivantsiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @prgeor to review and approve

@prgeor
Copy link
Collaborator

prgeor commented Dec 5, 2024

@rameshraghupathy see conflict

@kperumalbfn
Copy link

@rameshraghupathy could you check the failures

@kperumalbfn
Copy link

@vvolam could you review and sign-off?

@rameshraghupathy
Copy link
Author

@rameshraghupathy see conflict

@prgeor resolved

…e event with the original code and hence the set_initial_state function was required. Now the modififed smartswitch mode properly generates the initial config change event trigger also. So this function is no longer needed. If it is present it sends duplicate config change events the first time
@vvolam
Copy link

vvolam commented Dec 11, 2024

@rameshraghupathy could you resolve the build issue?

@rameshraghupathy
Copy link
Author

/azp run

Copy link

Commenter does not have sufficient privileges for PR 467 in repo sonic-net/sonic-platform-daemons

Copy link

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Approving based on my knowledge.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prabhataravind
Copy link

@prgeor could you please help merge if all comments are addressed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants