diff --git a/docs/environment_setup.md b/docs/environment_setup.md index 3b259788a..cb5e3c134 100644 --- a/docs/environment_setup.md +++ b/docs/environment_setup.md @@ -30,6 +30,12 @@ In this setup guide, let's run the `examples/basics` project. ```{prompt} bash git clone https://github.com/flyteorg/flytesnacks + +# or if your SSH key is registered on GitHub: +git clone git@github.com:flyteorg/flytesnacks.git + +# or if you use the `gh` tool: +gh repo clone flyteorg/flytesnacks cd flytesnacks/examples/basics pip install -r requirements.txt ``` @@ -67,8 +73,9 @@ pyflyte run basics/hello_world.py my_wf ``` :::{note} -The first couple arguments of `pyflyte run` is in the form of `path/to/script.py `, where -`` is the function decorated with `@workflow` that you want to run. +The first two arguments to `pyflyte run` have the form of +`path/to/script.py `, where `` is the function +decorated with `@workflow` that you want to run. ::: To run the workflow on the demo Flyte cluster, all you need to do is supply the `--remote` flag: @@ -103,7 +110,11 @@ option as `--arg-name`. ## Visualizing Workflows -Workflows can be visualized as DAGs on the UI. However, you can visualize workflows on the browser and in the terminal by *just* using your terminal. +Workflows can be visualized as DAGs in the UI. You can also visualize workflows +from your terminal that will be displayed in your default web browser. This +visualization uses the service at graph.flyte.org to render Graphviz diagrams, +and hence shares your DAG (but not your data or code) with an outside party +(security hint 🔐). To view workflow on the browser: @@ -127,15 +138,20 @@ flytectl get workflows \ basics.basic_workflow.my_wf ``` -Replace `` with version from console UI, it may look something like `BLrGKJaYsW2ME1PaoirK1g==` +Replace `` with the base64-encoded version shown in the console UI, +that looks something like `BLrGKJaYsW2ME1PaoirK1g==`. :::{tip} -Running most of the examples in the **User Guide** only requires the default Docker image that ships with Flyte. -Many examples in the {ref}`tutorials` and {ref}`integrations` section depend on additional libraries, `sklearn`, -`pytorch`, or `tensorflow`, which will not work with the default docker image used by `pyflyte run`. -These examples will explicitly show you which images to use for running these examples by passing in the docker -image you want to use with the `--image` option in `pyflyte run`. +Running most of the examples in the **User Guide** only requires the default +Docker image that ships with Flyte. Many examples in the {ref}`tutorials` and +{ref}`integrations` section depend on additional libraries such as `sklearn`, +`pytorch`, or `tensorflow`, which will not work with the default docker image +used by `pyflyte run`. + +These examples will explicitly show you which images to use for running these +examples by passing in the docker image you want to use with the `--image` +option in `pyflyte run`. ::: 🎉 Congrats! Now you can run all the examples in the {ref}`userguide` 🎉 diff --git a/docs/getting_started/package_register.md b/docs/getting_started/package_register.md index 7985060b3..3425990af 100644 --- a/docs/getting_started/package_register.md +++ b/docs/getting_started/package_register.md @@ -269,7 +269,7 @@ By default, the `docker_build.sh` script: - Uses the `PROJECT_NAME` specified in the `pyflyte init` command, which in this case is `my_project`. - Will not use any remote registry. -- Uses the git sha to version your tasks and workflows. +- Uses the git revision SHA1 to version your tasks and workflows. ``` You can override the default values with the following flags: @@ -367,7 +367,7 @@ Let's break down what each flag is doing here: - `--archive`: This argument allows you to pass in a package file, which in this case is `flyte-package.tgz`. - `--version`: This is a version string that can be any string, but we recommend - using the git sha in general, especially in production use cases. + using the git revision in general, especially in production use cases. ### Using `pyflyte register` versus `pyflyte package` + `flytectl register` diff --git a/docs/index.md b/docs/index.md index c220bee5b..64d8dd252 100644 --- a/docs/index.md +++ b/docs/index.md @@ -33,8 +33,8 @@ on your local machine. :title: text-muted :animate: fade-in-slide-down -The introduction below is also available on a hosted sandbox environment, where -you can get started with Flyte without installing anything locally. +Union.ai provides a hosted sandbox environment, free of charge, where you can +get started with Flyte without installing anything locally. ```{link-button} https://sandbox.union.ai/ --- @@ -73,10 +73,10 @@ First install [flytekit](https://pypi.org/project/flytekit/), Flyte's Python SDK pip install flytekit flytekitplugins-deck-standard scikit-learn ``` -Then install [flytectl](https://docs.flyte.org/projects/flytectl/en/latest/), +Next install [flytectl](https://docs.flyte.org/projects/flytectl/en/latest/), which the command-line interface for interacting with a Flyte backend. -````{tabbed} Homebrew +````{tabbed} Homebrew (macOS) ```{prompt} bash $ brew install flyteorg/homebrew-tap/flytectl @@ -84,7 +84,7 @@ brew install flyteorg/homebrew-tap/flytectl ```` -````{tabbed} Curl +````{tabbed} Curl (Unix-like) ```{prompt} bash $ curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin @@ -92,6 +92,15 @@ curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin ```` +````{tabbed} Windows + +```{prompt} C:\> +TODO +``` + +```` + + ## Creating a Workflow The first workflow we'll create is a simple model training workflow that consists @@ -99,13 +108,13 @@ of three steps that will: 1. 🍷 Get the classic [wine dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#wine-recognition-dataset) using [sklearn](https://scikit-learn.org/stable/). -2. 📊 Process the data that simplifies the 3-class prediction problem into a - binary classification problem by consolidating class labels `1` and `2` into - a single class. -3. 🤖 Train a `LogisticRegression` model to learn a binary classifier. +2. 📊 Process the data by simplifying its 3-class prediction problem into a binary + classification problem by consolidating class labels 1 and 2 into a single + class. +3. 🤖 Train a `LogisticRegression` model to create a binary classifier. -First, we'll define three tasks for each of these steps. Create a file called -`example.py` and copy the following code into it. +Let's define three tasks, corresponding to each of these steps. Create a +file called example.py and copy the following code into it. ```{code-cell} python :tags: [remove-output] @@ -126,7 +135,9 @@ def get_data() -> pd.DataFrame: @task def process_data(data: pd.DataFrame) -> pd.DataFrame: """Simplify the task from a 3-class to a binary classification problem.""" - return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1)) + df = data.copy() + df.loc[df.target == 0, "target"] = 1 + return df @task def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression: @@ -139,10 +150,11 @@ def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression As we can see in the code snippet above, we defined three tasks as Python functions: `get_data`, `process_data`, and `train_model`. -In Flyte, **tasks** are the most basic unit of compute and serve as the building -blocks 🧱 for more complex applications. A task is a function that takes some -inputs and produces an output. We can use these tasks to define a simple model -training workflow: +In Flyte, **tasks** are the most basic "unit of compute" (per Kubernetes +jargon) and serve as the building blocks 🧱 for more complex applications. +At its core, a task is simply a function: it takes inputs and produces and +output. We can use these tasks to define a simple model training workflow: + ```{code-cell} python @workflow @@ -165,7 +177,7 @@ is typically written with inputs and outputs. A **workflow** is also defined as a Python function, and it specifies the flow of data between tasks and, more generally, the dependencies between tasks 🔀. -::::{dropdown} {fa}`info-circle` The code above looks like Python, but what do `@task` and `@workflow` do exactly? +::::{dropdown} {fa}`info-circle` This looks like typical Python, but what do `@task` and `@workflow` do? :title: text-muted :animate: fade-in-slide-down @@ -173,7 +185,7 @@ Flyte `@task` and `@workflow` decorators are designed to work seamlessly with your code-base, provided that the *decorated function is at the top-level scope of the module*. -This means that you can invoke tasks and workflows as regular Python methods and +This means that you can invoke tasks and workflows as regular Python functions and even import and use them in other Python modules or scripts. :::{note} @@ -202,16 +214,19 @@ pyflyte run example.py training_workflow \ :animate: fade-in-slide-down If you're using Bash, you can ignore this 🙂 -You may need to add .local/bin to your PATH variable if it's not already set, -as that's not automatically added for non-bourne shells like fish or xzsh. - -To use pyflyte, make sure to set the /.local/bin directory in PATH +You may need to add .local/bin to your PATH variable if it's not already set; +it may not automatically get added for non-bourne shells. For example, if you +use `fish` or `csh`, you can set this with: :::{code-block} fish -set -gx PATH $PATH ~/.local/bin +set -gx PATH $PATH ~/.local/bin # fish +::: + +:::{code-block} csh +set path = ($path $HOME/.local/bin) # csh/tcsh ::: -::::: +::::: :::::{dropdown} {fa}`info-circle` Why use `pyflyte run` rather than `python example.py`? @@ -223,7 +238,9 @@ set -gx PATH $PATH ~/.local/bin Keyword arguments can be supplied to ``pyflyte run`` by passing in options in the format ``--kwarg value``, and in the case of ``snake_case_arg`` argument -names, you can pass in options in the form of ``--snake-case-arg value``. +names, you can optionally spell them as "kebab case," for example as +``--snake-case-arg value``. + ::::{note} If you want to run a workflow with `python example.py`, you would have to write @@ -347,8 +364,8 @@ There are a few features about FlyteConsole worth pointing out in the GIF above: ## What's Next? Follow the rest of the sections in the documentation to get a better -understanding of the key constructs that make Flyte such a powerful -orchestration tool 💪. +understanding of the key constructs that make Flyte a powerful orchestration +tool 💪. ```{admonition} Recommendation :class: tip diff --git a/examples/basics/basics/hello_world.py b/examples/basics/basics/hello_world.py index 19e4cfac5..219f6da73 100644 --- a/examples/basics/basics/hello_world.py +++ b/examples/basics/basics/hello_world.py @@ -12,8 +12,9 @@ from flytekit import task, workflow # %% [markdown] -# You can change the signature of the workflow to take in an argument like this: - +# You can change the signature of the task to take in an argument like this: +# def say_hello(name: str) -> str: +# return f"hello {name}" # %% @task def say_hello() -> str: @@ -21,10 +22,12 @@ def say_hello() -> str: # %% [markdown] -# You can treat the outputs of a task as you normally would a Python function. Assign the output to two variables -# and use them in subsequent tasks as normal. See {py:func}`flytekit.workflow` +# You can treat the outputs of a task as you normally would a Python function. +# Assign the output to two variables and use them in subsequent tasks as normal. +# See {py:func}`flytekit.workflow` # You can change the signature of the workflow to take in an argument like this: - +# def my_wf(name: str) -> str: +# ... # %% @workflow def my_wf() -> str: @@ -49,5 +52,5 @@ def my_wf() -> str: # %% [markdown] -# In the next few examples you'll learn more about the core ideas of Flyte, which are tasks, workflows, and launch -# plans. +# In the next few examples you'll learn more about the core ideas of Flyte, +# which are tasks, workflows, and launch plans. diff --git a/examples/basics/basics/task.py b/examples/basics/basics/task.py index d0f360988..edfe36f0b 100644 --- a/examples/basics/basics/task.py +++ b/examples/basics/basics/task.py @@ -7,9 +7,10 @@ # .. tags:: Basic # ``` # -# Task is a fundamental building block and an extension point of Flyte, which encapsulates the users' code. They possess the following properties: +# Task is a fundamental building block and an extension point of Flyte, which +# encapsulates the users' code. They possess the following properties: # -# 1. Versioned (usually tied to the `git sha`) +# 1. Versioned (usually tied to the `git revision`) # 2. Strong interfaces (specified inputs and outputs) # 3. Declarative # 4. Independently executable @@ -17,12 +18,17 @@ # # A task in Flytekit can be of two types: # -# 1. A task that has a Python function associated with it. The execution of the task is equivalent to the execution of this function. -# 2. A task that doesn't have a Python function, e.g., an SQL query or any portable task like Sagemaker prebuilt algorithms, or a service that invokes an API. +# 1. A task that has a Python function associated with it. The execution of the +# task is equivalent to the execution of this function. +# 2. A task that doesn't have a Python function, e.g., an SQL query or any +# portable task like Sagemaker prebuilt algorithms, or a service that +# invokes an API. # -# Flyte provides multiple plugins for tasks, which can be a backend plugin as well ([Athena](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-aws-athena/flytekitplugins/athena/task.py)). +# Multiple plugins for tasks--including backend plugins--are available in Flyte. +# See also ([Athena](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-aws-athena/flytekitplugins/athena/task.py)). # -# In this example, you will learn how to write and execute a `Python function task`. Other types of tasks will be covered in the later sections. +# In this example, you will learn how to write and execute a `Python function task`. +# Other types of tasks are covered in the later sections. # %% [markdown] # For any task in Flyte, there is one necessary import, which is: # %% @@ -35,13 +41,19 @@ from sklearn.model_selection import train_test_split # %% [markdown] -# The use of the {py:func}`flytekit.task` decorator is mandatory for a ``PythonFunctionTask``. -# A task is essentially a regular Python function, with the exception that all inputs and outputs must be clearly annotated with their types. -# These types are standard Python types, which will be further explained in the {ref}`type-system section `. +# The use of the {py:func}`flytekit.task` decorator is mandatory for +# a ``PythonFunctionTask``. +# A task is a regular Python function, but with a requirement that +# all inputs and outputs must be clearly annotated with their types. +# These types are standard Python types, which are further explained +# in the {ref}`type-system section `. + # %% @task -def train_model(hyperparameters: dict, test_size: float, random_state: int) -> LogisticRegression: +def train_model( + hyperparameters: dict, test_size: float, random_state: int +) -> LogisticRegression: """ Parameters: hyperparameters (dict): A dictionary containing the hyperparameters for the model. @@ -55,7 +67,9 @@ def train_model(hyperparameters: dict, test_size: float, random_state: int) -> L iris = load_iris() # Splitting the data into train and test sets - X_train, _, y_train, _ = train_test_split(iris.data, iris.target, test_size=test_size, random_state=random_state) + X_train, _, y_train, _ = train_test_split( + iris.data, iris.target, test_size=test_size, random_state=random_state + ) # Creating and training the logistic regression model with the given hyperparameters clf = LogisticRegression(**hyperparameters) @@ -74,7 +88,9 @@ def train_model(hyperparameters: dict, test_size: float, random_state: int) -> L # You can execute a Flyte task as any normal function. # %% if __name__ == "__main__": - print(train_model(hyperparameters={"C": 0.1}, test_size=0.2, random_state=42)) + print( + train_model(hyperparameters={"C": 0.1}, test_size=0.2, random_state=42) + ) # %% [markdown] # ## Invoke a Task within a Workflow @@ -87,44 +103,55 @@ def train_model(hyperparameters: dict, test_size: float, random_state: int) -> L @workflow def train_model_wf( - hyperparameters: dict = {"C": 0.1}, test_size: float = 0.2, random_state: int = 42 + hyperparameters: dict = {"C": 0.1}, + test_size: float = 0.2, + random_state: int = 42, ) -> LogisticRegression: """ - This workflow invokes the train_model task with the given hyperparameters, test size and random state. + This workflow invokes the train_model task with the given hyperparameters, + test size and random state. """ - return train_model(hyperparameters=hyperparameters, test_size=test_size, random_state=random_state) + return train_model( + hyperparameters=hyperparameters, + test_size=test_size, + random_state=random_state, + ) # %% [markdown] # ```{note} -# When invoking the `train_model` task, you need to use keyword arguments to specify the values for the corresponding parameters. +# When invoking the `train_model` task, you need to use keyword arguments to +# specify the values for the corresponding parameters. # ```` # # ## Use `partial` to provide default arguments to tasks # -# You can use the {py:func}`functools.partial` function to assign default or constant values to the parameters of your tasks. +# You can use the {py:func}`functools.partial` function to assign default or +# constant values to the parameters of your tasks. # %% import functools @workflow -def train_model_wf_with_partial(test_size: float = 0.2, random_state: int = 42) -> LogisticRegression: +def train_model_wf_with_partial( + test_size: float = 0.2, random_state: int = 42 +) -> LogisticRegression: partial_task = functools.partial(train_model, hyperparameters={"C": 0.1}) return partial_task(test_size=test_size, random_state=random_state) -# %% [markdown] -# In this toy example, we're calling the `square` task twice and returning the result. - # %% [markdown] # (single_task_execution)= # # :::{dropdown} Execute a single task *without* a workflow # -# While workflows are typically composed of multiple tasks with dependencies defined by shared inputs and outputs, -# there are cases where it can be beneficial to execute a single task in isolation during the process of developing and iterating on its logic. -# Writing a new workflow definition every time for this purpose can be cumbersome, but executing a single task without a workflow provides a convenient way to iterate on task logic easily. +# While workflows are typically composed of multiple tasks with dependencies +# defined by shared inputs and outputs, there are cases where it can be beneficial +# to execute a single task in isolation during the process of developing and +# iterating on its logic. Writing a new workflow definition every time for this +# purpose can be cumbersome, but executing a single task without a workflow +# provides a convenient way to iterate on task logic easily. # # To run a task without a workflow, use the following command: #