Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update the storage-version-migration script to complete execution in finite time #1682

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ayushsatyam146
Copy link

@ayushsatyam146 ayushsatyam146 commented Sep 16, 2024

Changes

hack/storage-version-migration.sh script is mentioned in the install steps. If that script is executed it runs infinitely without reporting any success or failure. This PR adds changes to put checks in hack/storage-version-migration.sh script to give success or failure response and terminate the execution in finite time.

Fixes #1680

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

NONE

@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 16, 2024
@openshift-ci openshift-ci bot added the release-note-none Label for when a PR does not need a release note label Sep 16, 2024
Copy link
Contributor

openshift-ci bot commented Sep 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign adambkaplan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR needs a bit of re-work. The script as is will always complete and return false positives.

Comment on lines +75 to +77
if [[ "${jobStatus}" == *"Complete"* ]]; then
echo "Job ${JOB_NAME} has completed successfully!"
exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient. condition.type tells you if the status condition is reporting the Complete or Failed . The Complete and Failed condition types should always be populated in the Conditions array.

You need to inspect the condition.status field, which can be True, False, or Unknown.

Comment on lines -70 to -73
while [ "$(kubectl -n shipwright-build get job "${JOB_NAME}" -o json | jq -r '.status.completionTime // ""')" == "" ]; do
echo "[INFO] Storage version migraton job is still running"
sleep 10
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably where the bug is. Per the Job API documentation:

completionTime(Time):

Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. The completion time is set when the job finishes successfully, and only then. The value cannot be updated or removed. The value indicates the same or later point in time as the startTime field.

I suspect that in your case, the job is failing. This script doesn't see a completion time because the Job never succeeds, hence it just runs forever.

See https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#JobStatus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Label for when a PR does not need a release note size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[BUG] storage version migration script runs infinitely
2 participants