Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --err-regex | -er CLI flag to jf job rerun #186

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

janosh
Copy link
Collaborator

@janosh janosh commented Sep 18, 2024

closes #185

happy to add tests if basic idea seems acceptable

@davidwaroquiers
Copy link
Member

Hi @janosh

Thx for the PR. The idea sounds reasonable. I guess you are already aware this is possible through the --query already but with a more "involved" way of doing it (basically write the query you have in the implemented lines directly in the --query). If you think this is something that is sufficiently common, I'm not against having that option (same way that many other already existing options could also be done with a query, but it's much more convenient with the option than writing the query). I'm asking that just to make sure we are not adding too many options (and again, fully in line with this if you think it's a sufficiently common use case) on how to select/filter jobs/flows to rerun or list or else. I'll let @gpetretto comment on the implementation as well (I think he has one small comment on the way it's done).

@janosh
Copy link
Collaborator Author

janosh commented Sep 19, 2024

i've needed secveral times lately and the curly bracket and quote matching in building nested queries is always tedious. i think rerunning based on error message will be a fairly common use case but i understand the concern of proliferating CLI flags. feel free to close this PR unless you believe it's useful.

i noticed the issue that i assume @gpetretto saw as well with the current code issuing the warning:

No filter has been set. This will apply the changes...

will fix if we decide to move forward

@gpetretto
Copy link
Contributor

I am also a bit afraid about the number of filtering options and the possibility that a user might get lost in the list. However, if you think that this might be useful in general we can go on.
My main comment would be to deal with the --err-regex as all the other options used in the filtering, so that it could also eventually be added to the other commands as well( although I do not really expect to be very useful for commands like pause/play. Maybe could be interesting for list.
So I would ask you to add it as a general CLI type in jobflow_remote.cli.types and propagate the argument to JobController._many_jobs_action and JobController._build_query_job. Also make it explicitly incompatible with the query option, both in JobController._many_jobs_action and in execute_multi_jobs_cmd. If it is an option I would rather it behave as all the others. At the moment I have not allowed mixing query with the other options because, while it may may be fine in most cases, I am afraid that simply merging the dictionaries may lead to incosistencies if the query is a bit involved. I am open to discuss this point if it is considered important to mix query with the other options, but then it should be for all of them.

One more note: error only contains messages for problems that originate from the failure of the Job. A job in the REMOTE_ERROR state would have the error message in JobDoc.remote.error. Do you want to consider the filtering to be applied to both of them? Or just to the main error as it is now?

@davidwaroquiers
Copy link
Member

Same here, as you needed to use it a lot lately, it is a use case so let's go forward with it. I agree with @gpetretto's comments and suggestions if you can address them.

Best

@janosh
Copy link
Collaborator Author

janosh commented Sep 20, 2024

thanks a lot! excellent suggestions as always @gpetretto @davidwaroquiers. 👍

i forgot that REMOTE_ERROR error messages are stored in a different field but adding the --err-regex flag to jf job retry was something i wanted to bring up as it seems equally useful there. i'll take a stab at addressing all comments in a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rerun failed jobs by error message
3 participants