Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighter session changed state as ERROR 30 mins after pod deletion, despite 5s LIGHTER_SESSION_TRACK_RUNNING_INTERVAL #1062

Open
HarshithaRmAkamai opened this issue Jun 18, 2024 · 0 comments

Comments

@HarshithaRmAkamai
Copy link

Here is our configuration for LIGHTER_SESSION_TRACK_RUNNING_INTERVAL set to 5 seconds:

  • name: LIGHTER_SESSION_TRACK_RUNNING_INTERVAL
    value: "5s"
    This setting determines how frequently tasks update session states such as 'ready', 'idle', 'killed', and others.

During testing:
I deleted the driver pod to observe state updates, expecting the session state to change to 'killed' within 5 seconds. However, it actually took nearly 30 minutes for the state to update, and it eventually showed as state as ERROR.

Expectation 1: We anticipated the session state to transition within 5 seconds after deleting the pod.
Expectation 2: We expected the eventual state to be 'KILLED', not 'ERROR' after the pod deletion

Actual : It took nearly 30 minutes for the state transition to 'ERROR'.

We would appreciate assistance in understanding how the lighter configurations influence session states and which actions trigger these state updates.

Screenshot 2024-06-18 at 3 29 47 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant