Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for async project exec #7666

Open
haraldschilly opened this issue Jul 5, 2024 · 1 comment
Open

API for async project exec #7666

haraldschilly opened this issue Jul 5, 2024 · 1 comment

Comments

@haraldschilly
Copy link
Contributor

haraldschilly commented Jul 5, 2024

There is an project_exec in the v1 api … but, it became apparent there is a need to make it possible to run asynchronously. I.e. a job (command + args) is launched, a unique ID is returned, and later the callee can query by job ID to check the status and get the returned stdout+stderr. (status codes I can think of: "running" | "error" | "missing" | "success").

My idea how to implement this is by letting the project spawn a sub-process (child_process), but to not block on it. Instead, it stores the spawned process internally in a map, where the key is just an UUID or a simple number. In any case, with that, it should be possible to check up on a spawned sub-process by ID. Results & status are stored in memory.

The more complex detail is how to properly route the request from the hub to the project.
My feeling is, it isn't too hard to extend whatever project_exec does right now, but I haven't looked at the code. So, maybe the hard part is already in place? Just needs a bit of "high level" treatment to fit into the v2 framework?

Concerns about memory usage:

  • Another detail is how to deal with too much output. Probably the best idea is to cap stdout and stderr, e.g. at 1 MB, and only keep the remainder of the output (aka "tail").
  • There should also be an overall limit to how many such outputs are stored, and the stored output should be deleted from memory, once it has been returned by an API call.

As an extra, I think it would also be cool to add some information about the process, similar to /usr/bin/time in linux. I'm thinking of e.g. the output of process.resourceUsage … such that callers know a bit more about what was going on inside the project.

@williamstein
Copy link
Contributor

FYI, I think this will be relatively easy to implement, and the hard part is fortunately mostly already done. Deciding on all those parameters is very, very helpful.

What's your motivation for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants