diff --git a/flyte.qmd b/flyte.qmd index 8ba70a3..a448504 100644 --- a/flyte.qmd +++ b/flyte.qmd @@ -65,6 +65,18 @@ workflow_outputs = typing.NamedTuple( @task def generate_processed_corpus() -> List[List[str]]: +@task +def train_word2vec_model(training_data: List[List[str]], hyperparams: Word2VecModelHyperparams) -> model_file: + +@task +def train_lda_model(corpus: List[List[str]], hyperparams: LDAModelHyperparams) -> Dict[int, List[str]]: + +@task +def word_similarities(model_ser: FlyteFile[MODELSER_NLP], word: str) -> Dict[str, float]: + +@task +def word_movers_distance(model_ser: FlyteFile[MODELSER_NLP]) -> float: + @workflow def nlp_workflow(target_word: str = "computer") -> [Dict[str, float], float, Dict[int, List[str]]]: corpus = generate_processed_corpus() diff --git a/images/ml-workflow-example.svg b/images/ml-workflow-example.svg new file mode 100644 index 0000000..4fa5633 --- /dev/null +++ b/images/ml-workflow-example.svg @@ -0,0 +1,4 @@ + + + +
Triggered by WebUI
Export Model to ONNX
Triggered by MLOps
Preprocessing
Get Validation DataSet
Get Training DataSet
Upload Model to Registry
Preprocessing
Optimize Hyperparameters
Get Inference Data
Preprocessing
Training
GPU
Validation
GPU
Inference
Model Registry
Get Model
Training
GPU
Training
GPU
GPU
Cached
Cached
Cached
Cached
Cached
Cached
Deployment
Website
Gather Results
Loop
\ No newline at end of file diff --git a/workflows.qmd b/workflows.qmd index bd5136e..6458ce0 100644 --- a/workflows.qmd +++ b/workflows.qmd @@ -67,19 +67,20 @@ K. L. Polsterer, B. Doser, A. Fehlner and S. Trujillo-Gomez [ADASS (2024)](). ## Requirements on Workflows Orchestration - - Define node requirements (e.g. CPU, memory, GPU) + - Define execution requirements (e.g. GPUs, CPUs, memory) - Control runtime environment with containers - - Underlying data pipeline? + - Orchestration features + - Parallelization: Run independent tasks automatically in parallel + - Caching: Avoid recomputing successful tasks + - Nesting: Reuse workflows as tasks + - Looping: Repeat tasks based on conditions + - Scattering: Distribute data to multiple tasks + - Conditionals: Branching based on conditions - - Parallelization: Run independent tasks automatically in parallel - - Caching: Avoid recomputing successful tasks - - Nesting: Reuse workflows as tasks - - Looping: Repeat tasks based on conditions - - Scattering: Distribute data to multiple tasks - - Conditionals: Branching based on conditions +## Machine Learning Workflow Example -![](images/flyte-ui_mnist-workflow.png) +![](images/ml-workflow-example.svg){fig-align="center"} ## Options to Generate a Workflow