Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with and discussion of alternative solutions #535

Open
kjohnsen opened this issue May 8, 2024 · 2 comments
Open

Comparison with and discussion of alternative solutions #535

kjohnsen opened this issue May 8, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@kjohnsen
Copy link

kjohnsen commented May 8, 2024

There are a whole bunch of frameworks for organizing research code/computational pipelines, like redun, WDL, CWL, Nextflow, Snakemake, Reflow. I'm curious how this compares.
https://insitro.github.io/redun/design.html#influences

@kjohnsen kjohnsen changed the title Curious how this compares to other workflow engines How does this compares to other workflow engines? May 8, 2024
@frthjf
Copy link
Collaborator

frthjf commented May 9, 2024

Thanks for your interest! It's a good question, and the documentation is far from being clear on this, so let me give a brief answer here and leave this issue open as a reminder to myself to improve the documentation.

machinable's focus is on providing a hackable user interface for scientific applications. I like to think of it as less of a workflow engine and more as a framework to build meaningful, intuitive 'wrappers' around complex applications. A bit like click but not just for CLI but Python/Jupyter land as well. As far as I can tell, it would make sense to use machinable to build an interface for the much more powerful redun/WDL/... pipelines. Why? Suppose we have a complicated pipeline with many options, what often ends up happening is a user interface like this:

python example.py \
    devices=8 \
    max_epochs=100 \
    data_train=mnist \
    data_val=['cifar'] \
    prepare@transform=cell \
    prepare@val_transform=cell
    normlization_mean=[0,0,0,0,0,0,0,0] \
    normalization_std=[1,1,1,1,1,1,1,1] \ 
    ... # and so on as complexity grows

Typing and editing becomes tedious quickly, so you often see this refactored into something like this:

python example.py --config ./configs/baseline.json

Better, but now it becomes tricky to manage configuration files (config/tuesday-baseline-run-02-second-try.json ...) and the configuration file becomes a user interface in itself.

Now, what machinable allows you to do is to build an interface for the example application with a small 'project specific language':

machinable get .example "~image_data(transform='cell')" "~norm(0,1)" max_epochs=100 devices="num_gpus()" --launch

The 'configuration file' then ends up just being a regular Python script.

from machinable import get

get('example', [
   "~image_data(transform='cell')", 
   "~norm(0,1)",
   {"max_epochs": 100, "devices": "num_gpus()"}
]).launch()

Since it's Python, you can do things that would be hard to do via the CLI or a config file:

from machinable import get

x = []
y = []

for num_epochs in [50, 100]:
   if experiment := get('example', [
	   "~image_data(transform='cell')", 
	   "~norm(0,1)", 
	   {'max_epochs': num_epochs}
   ]).future():
	  x.append(num_epochs)
	  y.append(experiment.accuracy())

plot(x, y)

Overall, the rationale is to make interacting with the code easier, more self-documenting and less error prone.

I hope this gives you a vague idea of how machinable fits in the space; hopefully, I'll find some time to update the documentation but in the meantime let me know if you have more questions.

@frthjf frthjf added the documentation Improvements or additions to documentation label May 9, 2024
@frthjf frthjf changed the title How does this compares to other workflow engines? Comparison with and discussion of alternative solutions May 9, 2024
@kjohnsen
Copy link
Author

kjohnsen commented Oct 5, 2024

Thanks, that's a lot clearer now! (sorry for the delay; I'm bad at checking GitHub notifications)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants