-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DOC-648 flytesnacks edits needed for Neptune and W&B flytekit plugins…
… documentation (#1748) * update refs Signed-off-by: nikki everett <[email protected]> * clean up integrations information architecture Signed-off-by: nikki everett <[email protected]> * fix databricks agent title Signed-off-by: nikki everett <[email protected]> * move k8s pod plugin to deprecated integrations section and add deprecation notice Signed-off-by: nikki everett <[email protected]> --------- Signed-off-by: nikki everett <[email protected]>
- Loading branch information
Showing
6 changed files
with
150 additions
and
112 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,83 +8,79 @@ Flyte is designed to be highly extensible and can be customized in multiple ways | |
Want to contribute an example? Check out the {ref}`Documentation contribution guide <contribute_docs>`. | ||
``` | ||
|
||
## Flytekit Plugins | ||
## Flytekit plugins | ||
|
||
Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending | ||
Flytekit functionality. These plugins can be anything and for comparison can be thought of like | ||
[Airflow Operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html). | ||
Flytekit plugins can be implemented purely in Python, unit tested locally, and allow extending | ||
Flytekit functionality. For comparison, these plugins can be thought of like | ||
[Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html). | ||
|
||
```{list-table} | ||
:header-rows: 0 | ||
:widths: 20 30 | ||
* - {doc}`SQL </auto_examples/sql_plugin/index>` | ||
- Execute SQL queries as tasks. | ||
* - {doc}`Great Expectations </auto_examples/greatexpectations_plugin/index>` | ||
- Validate data with `great_expectations`. | ||
* - {doc}`Papermill </auto_examples/papermill_plugin/index>` | ||
- Execute Jupyter Notebooks with `papermill`. | ||
* - {doc}`Pandera </auto_examples/pandera_plugin/index>` | ||
- Validate pandas dataframes with `pandera`. | ||
* - {doc}`Modin </auto_examples/modin_plugin/index>` | ||
- Scale pandas workflows with `modin`. | ||
* - {doc}`Dolt </auto_examples/dolt_plugin/index>` | ||
- Version your SQL database with `dolt`. | ||
* - {doc}`DBT </auto_examples/dbt_plugin/index>` | ||
- Run and test your `dbt` pipelines in Flyte. | ||
* - {doc}`WhyLogs </auto_examples/whylogs_plugin/index>` | ||
- `whylogs`: the open standard for data logging. | ||
* - {doc}`MLFlow </auto_examples/mlflow_plugin/index>` | ||
- `mlflow`: the open standard for model tracking. | ||
* - {doc}`ONNX </auto_examples/onnx_plugin/index>` | ||
- Convert ML models to ONNX models seamlessly. | ||
* - {doc}`Dolt </auto_examples/dolt_plugin/index>` | ||
- Version your SQL database with `dolt`. | ||
* - {doc}`DuckDB </auto_examples/duckdb_plugin/index>` | ||
- Run analytical queries using DuckDB. | ||
* - {doc}`Weights and Biases </auto_examples/wandb_plugin/index>` | ||
- `wandb`: Machine learning platform to build better models faster. | ||
* - {doc}`Great Expectations </auto_examples/greatexpectations_plugin/index>` | ||
- Validate data with `great_expectations`. | ||
* - {doc}`MLFlow </auto_examples/mlflow_plugin/index>` | ||
- `mlflow`: the open standard for model tracking. | ||
* - {doc}`Modin </auto_examples/modin_plugin/index>` | ||
- Scale pandas workflows with `modin`. | ||
* - {doc}`Neptune </auto_examples/neptune_plugin/index>` | ||
- `neptune`: Neptune is the MLOps stack component for experiment tracking. | ||
* - {doc}`NIM </auto_examples/nim_plugin/index>` | ||
- Serve optimized model containers with NIM. | ||
* - {doc}`Ollama </auto_examples/ollama_plugin/index>` | ||
- Serve fine-tuned LLMs with Ollama in a Flyte workflow. | ||
* - {doc}`ONNX </auto_examples/onnx_plugin/index>` | ||
- Convert ML models to ONNX models seamlessly. | ||
* - {doc}`Pandera </auto_examples/pandera_plugin/index>` | ||
- Validate pandas dataframes with `pandera`. | ||
* - {doc}`Papermill </auto_examples/papermill_plugin/index>` | ||
- Execute Jupyter Notebooks with `papermill`. | ||
* - {doc}`SQL </auto_examples/sql_plugin/index>` | ||
- Execute SQL queries as tasks. | ||
* - {doc}`Weights and Biases </auto_examples/wandb_plugin/index>` | ||
- `wandb`: Machine learning platform to build better models faster. | ||
* - {doc}`WhyLogs </auto_examples/whylogs_plugin/index>` | ||
- `whylogs`: the open standard for data logging. | ||
``` | ||
|
||
:::{dropdown} {fa}`info-circle` Using flytekit plugins | ||
:::{dropdown} {fa}`info-circle` Using Flytekit plugins | ||
:animate: fade-in-slide-down | ||
|
||
Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the | ||
{py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit. | ||
Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the {py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit. | ||
|
||
Flytekit Plugins are lazily loaded and can be released independently like libraries. We follow a convention to name the | ||
plugin like `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example | ||
`flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/). | ||
Flytekit plugins are lazily loaded and can be released independently like libraries. The naming convention is `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example, `flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/). | ||
|
||
You can find the plugins maintained by the core Flyte team [here](https://github.com/flyteorg/flytekit/tree/master/plugins). | ||
::: | ||
|
||
## Native Backend Plugins | ||
## Native backend plugins | ||
|
||
Native Backend Plugins are the plugins that can be executed without any external service dependencies because the compute is | ||
orchestrated by Flyte itself, within its provisioned Kubernetes clusters. | ||
Native backend plugins can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters. | ||
|
||
```{list-table} | ||
:header-rows: 0 | ||
:widths: 20 30 | ||
* - {doc}`K8s Pods </auto_examples/k8s_pod_plugin/index>` | ||
- Execute K8s pods for arbitrary workloads. | ||
* - {doc}`K8s Cluster Dask Jobs </auto_examples/k8s_dask_plugin/index>` | ||
- Run Dask jobs on a K8s Cluster. | ||
* - {doc}`K8s Cluster Spark Jobs </auto_examples/k8s_spark_plugin/index>` | ||
- Run Spark jobs on a K8s Cluster. | ||
* - {doc}`Kubeflow PyTorch </auto_examples/kfpytorch_plugin/index>` | ||
- Run distributed PyTorch training jobs using `Kubeflow`. | ||
* - {doc}`Kubeflow TensorFlow </auto_examples/kftensorflow_plugin/index>` | ||
- Run distributed TensorFlow training jobs using `Kubeflow`. | ||
* - {doc}`Kubernetes pods </auto_examples/k8s_pod_plugin/index>` | ||
- Execute Kubernetes pods for arbitrary workloads. | ||
* - {doc}`Kubernetes cluster Dask jobs </auto_examples/k8s_dask_plugin/index>` | ||
- Run Dask jobs on a Kubernetes Cluster. | ||
* - {doc}`Kubernetes cluster Spark jobs </auto_examples/k8s_spark_plugin/index>` | ||
- Run Spark jobs on a Kubernetes Cluster. | ||
* - {doc}`MPI Operator </auto_examples/kfmpi_plugin/index>` | ||
- Run distributed deep learning training jobs using Horovod and MPI. | ||
* - {doc}`Ray Task </auto_examples/ray_plugin/index>` | ||
* - {doc}`Ray </auto_examples/ray_plugin/index>` | ||
- Run Ray jobs on a K8s Cluster. | ||
``` | ||
|
||
|
@@ -98,54 +94,53 @@ orchestrated by Flyte itself, within its provisioned Kubernetes clusters. | |
:header-rows: 0 | ||
:widths: 20 30 | ||
* - {doc}`AWS SageMaker Inference agent </auto_examples/sagemaker_inference_agent/index>` | ||
- Deploy models and create, as well as trigger inference endpoints on AWS SageMaker. | ||
* - {doc}`Airflow agent </auto_examples/airflow_agent/index>` | ||
- Run Airflow jobs in your workflows with the Airflow agent. | ||
* - {doc}`BigQuery agent </auto_examples/bigquery_agent/index>` | ||
- Run BigQuery jobs in your workflows with the BigQuery agent. | ||
* - {doc}`ChatGPT agent </auto_examples/chatgpt_agent/index>` | ||
- Run ChatGPT jobs in your workflows with the ChatGPT agent. | ||
* - {doc}`Databricks </auto_examples/databricks_agent/index>` | ||
* - {doc}`Databricks agent </auto_examples/databricks_agent/index>` | ||
- Run Databricks jobs in your workflows with the Databricks agent. | ||
* - {doc}`Memory Machine Cloud </auto_examples/mmcloud_agent/index>` | ||
* - {doc}`Memory Machine Cloud agent </auto_examples/mmcloud_agent/index>` | ||
- Execute tasks using the MemVerge Memory Machine Cloud agent. | ||
* - {doc}`OpenAI Batch </auto_examples/openai_batch_agent/index>` | ||
- Submit requests for asynchronous batch processing on OpenAI. | ||
* - {doc}`SageMaker Inference </auto_examples/sagemaker_inference_agent/index>` | ||
- Deploy models and create, as well as trigger inference endpoints on SageMaker. | ||
* - {doc}`Sensor </auto_examples/sensor/index>` | ||
* - {doc}`Sensor agent </auto_examples/sensor/index>` | ||
- Run sensor jobs in your workflows with the sensor agent. | ||
* - {doc}`Snowflake </auto_examples/snowflake_agent/index>` | ||
* - {doc}`Snowflake agent </auto_examples/snowflake_agent/index>` | ||
- Run Snowflake jobs in your workflows with the Snowflake agent. | ||
``` | ||
|
||
(external_service_backend_plugins)= | ||
|
||
## External Service Backend Plugins | ||
## External service backend plugins | ||
|
||
As the term suggests, external service backend plugins rely on external services like | ||
[Hive](https://docs.qubole.com/en/latest/user-guide/engines/hive/index.html) for handling the workload defined in the Flyte task that uses the respective plugin. | ||
As the term suggests, these plugins rely on external services to handle the workload defined in the Flyte task that uses the plugin. | ||
|
||
```{list-table} | ||
:header-rows: 0 | ||
:widths: 20 30 | ||
* - {doc}`AWS Athena plugin </auto_examples/athena_plugin/index>` | ||
* - {doc}`AWS Athena </auto_examples/athena_plugin/index>` | ||
- Execute queries using AWS Athena | ||
* - {doc}`AWS Batch plugin </auto_examples/aws_batch_plugin/index>` | ||
* - {doc}`AWS Batch </auto_examples/aws_batch_plugin/index>` | ||
- Running tasks and workflows on AWS batch service | ||
* - {doc}`Flyte Interactive </auto_examples/flyteinteractive_plugin/index>` | ||
- Execute tasks using Flyte Interactive to debug. | ||
* - {doc}`Hive plugin </auto_examples/hive_plugin/index>` | ||
* - {doc}`Hive </auto_examples/hive_plugin/index>` | ||
- Run Hive jobs in your workflows. | ||
``` | ||
|
||
(enable-backend-plugins)= | ||
|
||
::::{dropdown} {fa}`info-circle` Enabling Backend Plugins | ||
::::{dropdown} {fa}`info-circle` Enabling backend plugins | ||
:animate: fade-in-slide-down | ||
|
||
To enable a backend plugin you have to add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration. | ||
The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/[email protected]/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows, | ||
To enable a backend plugin, you must add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration. | ||
The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/[email protected]/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows: | ||
|
||
```yaml | ||
tasks: | ||
|
@@ -160,15 +155,15 @@ tasks: | |
container_array: k8s-array | ||
``` | ||
**Finding the `ID` of the Backend Plugin** | ||
**Finding the `ID` of the backend plugin** | ||
|
||
This is a little tricky since you have to look at the source code of the plugin to figure out the `ID`. In the case of Spark, for example, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424) here, defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41). | ||
To find the `ID` of the backend plugin, look at the source code of the plugin. For examples, in the case of Spark, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424), defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41). | ||
|
||
:::: | ||
|
||
## SDKs for Writing Tasks and Workflows | ||
## SDKs for writing tasks and workflows | ||
|
||
The {ref}`community <community>` would love to help you with your own ideas of building a new SDK. Currently the available SDKs are: | ||
The {ref}`community <community>` would love to help you build new SDKs. Currently, the available SDKs are: | ||
|
||
```{list-table} | ||
:header-rows: 0 | ||
|
@@ -180,7 +175,7 @@ The {ref}`community <community>` would love to help you with your own ideas of b | |
- The Java/Scala SDK for Flyte. | ||
``` | ||
|
||
## Flyte Operators | ||
## Flyte operators | ||
|
||
Flyte can be integrated with other orchestrators to help you leverage Flyte's | ||
constructs natively within other orchestration tools. | ||
|
@@ -196,42 +191,91 @@ constructs natively within other orchestration tools. | |
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: Flytekit plugins | ||
|
||
DBT </auto_examples/dbt_plugin/index> | ||
Dolt </auto_examples/dolt_plugin/index> | ||
DuckDB </auto_examples/duckdb_plugin/index> | ||
Great Expectations </auto_examples/greatexpectations_plugin/index> | ||
MLFlow </auto_examples/mlflow_plugin/index> | ||
Modin </auto_examples/modin_plugin/index> | ||
Neptune </auto_examples/neptune_plugin/index> | ||
NIM </auto_examples/nim_plugin/index> | ||
Ollama </auto_examples/ollama_plugin/index> | ||
ONNX </auto_examples/onnx_plugin/index> | ||
Pandera </auto_examples/pandera_plugin/index> | ||
Papermill </auto_examples/papermill_plugin/index> | ||
SQL </auto_examples/sql_plugin/index> | ||
Weights & Biases </auto_examples/wandb_plugin/index> | ||
WhyLogs </auto_examples/whylogs_plugin/index> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: Native backend plugins | ||
Kubeflow PyTorch </auto_examples/kfpytorch_plugin/index> | ||
Kubeflow TensorFlow </auto_examples/kftensorflow_plugin/index> | ||
Kubernetes cluster Dask jobs </auto_examples/k8s_dask_plugin/index> | ||
Kubernetes cluster Spark jobs </auto_examples/k8s_spark_plugin/index> | ||
MPI Operator </auto_examples/kfmpi_plugin/index> | ||
Ray </auto_examples/ray_plugin/index> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: Flyte agents | ||
Airflow agent </auto_examples/airflow_agent/index> | ||
AWS Sagemaker inference agent </auto_examples/sagemaker_inference_agent/index> | ||
BigQuery agent </auto_examples/bigquery_agent/index> | ||
ChatGPT agent </auto_examples/chatgpt_agent/index> | ||
Databricks agent </auto_examples/databricks_agent/index> | ||
Memory Machine Cloud agent </auto_examples/mmcloud_agent/index> | ||
OpenAI batch agent </auto_examples/openai_batch_agent/index> | ||
Sensor agent </auto_examples/sensor/index> | ||
Snowflake agent </auto_examples/snowflake_agent/index> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: External service backend plugins | ||
AWS Athena </auto_examples/athena_plugin/index> | ||
AWS Batch </auto_examples/aws_batch_plugin/index> | ||
Flyte Interactive </auto_examples/flyteinteractive_plugin/index> | ||
Hive </auto_examples/hive_plugin/index> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: SDKs for writing tasks and workflows | ||
flytekit <https://flytekit.readthedocs.io/> | ||
flytekit-java <https://github.com/spotify/flytekit-java> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: Flyte operators | ||
Airflow </auto_examples/airflow_plugin/index> | ||
``` | ||
|
||
```{toctree} | ||
:maxdepth: -1 | ||
:hidden: | ||
:caption: Deprecated integrations | ||
/auto_examples/airflow_agent/index | ||
/auto_examples/airflow_plugin/index | ||
/auto_examples/athena_plugin/index | ||
/auto_examples/aws_batch_plugin/index | ||
/auto_examples/bigquery_agent/index | ||
/auto_examples/chatgpt_agent/index | ||
/auto_examples/k8s_dask_plugin/index | ||
/auto_examples/databricks_agent/index | ||
/auto_examples/dbt_plugin/index | ||
/auto_examples/dolt_plugin/index | ||
/auto_examples/duckdb_plugin/index | ||
/auto_examples/flyteinteractive_plugin/index | ||
/auto_examples/greatexpectations_plugin/index | ||
/auto_examples/hive_plugin/index | ||
/auto_examples/k8s_pod_plugin/index | ||
/auto_examples/mlflow_plugin/index | ||
/auto_examples/mmcloud_agent/index | ||
/auto_examples/modin_plugin/index | ||
/auto_examples/kfmpi_plugin/index | ||
/auto_examples/neptune_plugin/index | ||
/auto_examples/nim_plugin/index | ||
/auto_examples/ollama_plugin/index | ||
/auto_examples/onnx_plugin/index | ||
/auto_examples/openai_batch_agent/index | ||
/auto_examples/papermill_plugin/index | ||
/auto_examples/pandera_plugin/index | ||
/auto_examples/kfpytorch_plugin/index | ||
/auto_examples/ray_plugin/index | ||
/auto_examples/sagemaker_inference_agent/index | ||
/auto_examples/sensor/index | ||
/auto_examples/snowflake_agent/index | ||
/auto_examples/k8s_spark_plugin/index | ||
/auto_examples/sql_plugin/index | ||
/auto_examples/kftensorflow_plugin/index | ||
/auto_examples/wandb_plugin/index | ||
/auto_examples/whylogs_plugin/index | ||
Deprecated integrations <deprecated_integrations/index> | ||
BigQuery plugin </auto_examples/bigquery_plugin/index> | ||
Databricks plugin </auto_examples/databricks_plugin/index> | ||
Kubernetes pods </auto_examples/k8s_pod_plugin/index> | ||
Snowflake plugin </auto_examples/snowflake_plugin/index> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.