Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at new command line API #63

Merged
merged 10 commits into from
Mar 7, 2024

Conversation

andreww
Copy link
Collaborator

@andreww andreww commented Sep 27, 2023

Only changes the documentation. For discussion.

Other things to do:

  • Agree command line CLI
  • Implement changes
  • Make "cats" a runnable programme (via install process)
  • Update other documentation
  • Add tests!

Copy link
Contributor

@abhidg abhidg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the --scheduler option, will be easy to add others

cats/__init__.py Outdated Show resolved Hide resolved
cats/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A major concern I have regarding the CLI API, ultimately of concern to the PR but perhaps not its current state which is more focused on valid arguments and their scope seemingly at present, which looks good to me, is the machine parsability and the human readability of the possible outputs.

Namely, I strongly suggest that we ensure that for whatever information that a user can request to output, whether a single figure (e.g. with this new suggested --quiet option which I like) or multiple statistics (e.g. run time plus carbon estimate, etc.), there is a means to provide that in a way which could be parsed by code, particularly as a recognised data serialisation format such as JSON or YAML or CSV, etc. (JSON seems most sensible without my having thought much about the choices, but we can discuss this).

We could even provide CLI options so that different formats can be output, e.g. JSON, CSV, since on the code side it is trivial to convert between those once we have constructed a standard format. Which makes me think that the --scheduler option shouldn't control the format, rather we have separate CLI option(s) to set that, e.g. -j for JSON and -y for YAML, or we can have an option for --format and custom formats for a given scheduler could be set as values, e.g. --format=json, --format=at, or similar.

If a --scheduler gets set, it could (and I think this would be useful) automatically set the --format to the appropriate format, but also pass, e.g. pipe, to the relevant scheduler. And for optimum human readability, we can have a dedicated option such as --format=simple or something.

I realise we generally we only be outputting a small number of pieces of information, but regardless having a standard output format means people can extract or forward that information as they wish.

So in short I am currently thinking we have a separate --format option to --schedule(r) option to initiate the actual scheduling, in which case we should allow the -- option so arguments can be passed onto the schedule rather than to CATS.

I also wonder if CATS would benefit from a separate formatting option to cover the datetime format supplied for run time and any other datetimes, since that piece of data in itself can be specified in a myriad of different ways, inside the overall data format of the output.

@andreww
Copy link
Collaborator Author

andreww commented Sep 28, 2023

I like the idea of a --format and a --dateformat option to give us flexibility and allow us to output in a sensible data exchange format. I imagine --scheduler could just be a shorthand wrapper to a combination of other arguments to make the output play nice with other tools.

@abhidg
Copy link
Contributor

abhidg commented Jan 28, 2024

@andreww is this ready to merge?

@andreww
Copy link
Collaborator Author

andreww commented Jan 29, 2024

@abhidg - no, afraid not. Thus far it's just changing the documentation to describe what we want to happen (and that still needs further update following @sadielbartholomew's suggestion). Actually writing the code to make cats behave like this is still something that needs doing (and I don't think we should merge the documentation until it describes the code).

andreww and others added 5 commits February 22, 2024 21:29
Only changes the documentation. For discussion.
Adds --format option for output formatting, currently
only JSON is supported

Change sys.stderr.write to logging calls

Fixes: GreenScheduler#67
Adds a -c (--command) parameter to pass the command to the scheduler

Fixes: GreenScheduler#53
@colinsauze
Copy link
Member

colinsauze commented Feb 26, 2024

I've just had a play around with this and have a few questions about the way it now behaves:

  1. you can specify both --jobinfo and --config, which one takes precedence? The documentation doesn't say. Should we warn if both are specified?
  2. --command seems to be optional, is this intentional? In the old CLI we piped commands to At but that doesn't work now, so isn't this the only way to invoke a command?
  3. The README, "Use with Schedulers" pages of the documentation and animated gif example don't match the new CLI format.
  4. Should it be possible to run without the --scheduler option (currently it is)?
  5. Does anything accept the JSON format that you get when --format is specified? We used to have an At friendly format with a simple start date, this is probably redundant now we have the --scheduler option, but if it was going to be possible then another format for --format would seem to be the right way to achieve that (or should it be a --dateformat? I'm a little confused as to the differentiation between the two).

@abhidg
Copy link
Contributor

abhidg commented Feb 27, 2024

I've just had a play around with this and have a few questions about the way it now behaves:

  1. you can specify both --jobinfo and --config, which one takes precedence? The documentation doesn't say. Should we warn if both are specified?

They control different things: --config is for TDP and specifying CPU/GPU models as well as the API and location (currently only one API) and --jobinfo is job specific info like amount of memory or number of CPUs used.

  1. --command seems to be optional, is this intentional? In the old CLI we piped commands to At but that doesn't work now, so isn't this the only way to invoke a command?

I am good either way. One reasoning for keeping it optional is if the user wants to use CATS just to get the optimal time and not schedule.

  1. The README, "Use with Schedulers" pages of the documentation and animated gif example don't match the new CLI format.

Will update them once the CLI is locked in!

  1. Should it be possible to run without the --scheduler option (currently it is)?

Matter of taste. I am fine either way. We could also default to --scheduler=at such that --scheduler is actually optional and require a command.

  1. Does anything accept the JSON format that you get when --format is specified? We used to have an At friendly format with a simple start date, this is probably redundant now we have the --scheduler option, but if it was going to be possible then another format for --format would seem to be the right way to achieve that (or should it be a --dateformat? I'm a little confused as to the differentiation between the two).

Nothing accepts the JSON format, it is more for piping to something like jq, which we can give examples of -- the --dateformat option controls the date formatting within --format output. The old behaviour of piping a command to at can be expressed for example as:

ls | at -t $(cats --loc OX1 -d 5 --dateformat='%Y%m%d%H%M' --format=json | jq -r '.carbonIntensityOptimal.start'

@abhidg abhidg marked this pull request as ready for review February 28, 2024 19:37
cats/__init__.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tlestang tlestang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Thanks for the work.

@abhidg abhidg merged commit d7fa641 into GreenScheduler:main Mar 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants