-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Armada Job Submission Code #43
Conversation
Should we close this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my other review comments, I think we need unit test(s) exercising all the new code/behavior in def submit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs more testing of new code in submit()
.
Added note in code
Added Sleep for job-set-id creation time
Co-authored-by: Kevin Hannon <[email protected]>
Co-authored-by: Kevin Hannon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
armada_jupyter/__main__.py
Outdated
typer.echo(f"Submitted Job {job_id} to Armada") | ||
|
||
# Sleep to make sure that job-set-id is created | ||
time.sleep(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to do this? Sleeping is a pretty brittle way to wait for something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the API was designed - to the best of my knowlegde working with the python client over summer there is no other option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Sharpz7. This is a problem we found with the Armada API and their focus on being "eventually consistent".
If we this proves to be brittle, we could add an environment variable for this sleep behavior or remove the logic around checking if pod is ready.
As I write this, we may want to relax this. Since Armada is a queueing solution, you could actually have days/weeks before your job starts running. Right now we assume that it takes 3 seconds for your job to be queued and running but this is an invalid assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change how this operates. All this should do, IMO, is report whether the job is submitted successfully. Beyond that its up to the user to check on the status of their jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd want more discussion with GR internally before I make that kind of call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this sleep is waiting until the event stream
becomes active, not for the job to start running - I think those are independent. Therefore, this sleep time is pretty consistently under 3 seconds?
Maybe polling is acceptable in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done what I said above, with three "tries" to connect to the event stream, 1s between each try. Regardless of the job states, the job should generally always be created within this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I write this, we may want to relax this. Since Armada is a queueing solution, you could actually have days/weeks before >your job starts running. Right now we assume that it takes 3 seconds for your job to be queued and running but this is an >invalid assumption.
Based on this I am going to add a config option to toggle whether the user wants to use the event stream to wait or not
Going to split this PR into a docs PR and a code PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Closes #39
Closes #50
Closes #41
Closes #45