Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend wait time in cable firmware download flow #513

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stephenxs
Copy link
Collaborator

Description

Extend the wait time in the cable firmware download flow

Motivation and Context

The cable firmware download flow is the following:

  1. Start download (CMIS CDB command 0x0101)
  2. Write firmware to LPL or EPL (CMIS CDB command 0x103 or 0x104), depending on the module's capability
  3. Complete download and verify image (CMIS CDB command 0x0107)

For each CMIS CDB command, it writes the command and then waits for the cable's response. Sometimes it takes several seconds for the cable to handle the command and provide the response. In that scenario, we must delay a few seconds to avoid error messages.
There has been a 2-second delay in step 1 but it's not sufficient on some platforms with certain cables.

We extend the delay to 5 seconds in step 1 and introduce a 2-second delay in step 3 to secure a successful cable firmware download without any error.

How Has This Been Tested?

Additional Information (Optional)

@stephenxs stephenxs marked this pull request as ready for review November 18, 2024 10:53
@bingwang-ms
Copy link

@mihirpat1 Can you help review this PR?

@@ -415,6 +415,7 @@ def validate_fw_image(self):
cmd = bytearray(b'\x01\x07\x00\x00\x00\x00\x00\x00')
cmd[133-INIT_OFFSET] = self.cdb_chkcode(cmd)
self.write_cdb(cmd)
time.sleep(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs instead of fixing just the firmware download is it better to fix the cdb1_chkstatus()? I see that cdb1_chkstatus() already has 60 seconds of wait time while checking CDB status. if there are i2c read errors then that will be considered as CDB is busy? So Why we need this fix here?

Copy link
Collaborator Author

@stephenxs stephenxs Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor It retries in CDBCheckstatus after getting CDB status, like busy. But the I2C error can occur while reading CDB status.
We wait for an extra 2-second delay to ensure the cable can provide the status.
But we do not need to wait 2 seconds before checking every command's CDB status. It is fast for most commands, like writing EPL/RPL for downloading the firmware. The download process will be slowed down if we add wait logic there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs instead of simply hardcoding the delay, can we make use of CMIS advertised timeout values for CDB command completion?

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prgeor
the motivation of this PR is to overcome I2C error messages under the current infra.

Of course, we can leverage timeout values that indicate how long the cable can provide CDB status at maximum, and then wait for cables for the time. However, it requires many more logic to handle, like,

  • checking the capability of foreground/background processing
  • tolerating the possibility that it is not supported
  • reading those values in advance, storing them in the database or inside the platform API,
  • representing the values in the CLI
  • etc

It is beyond the scope of this PR but a new feature.

Copy link
Collaborator

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs need cleaner solution which works for all CMIS compliant vendors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants