Skip to content

Commit

Permalink
Refreshing website content from main repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Action Website Snapshot committed Nov 4, 2024
1 parent 96a63a7 commit 5fb7f9c
Show file tree
Hide file tree
Showing 13 changed files with 88 additions and 78 deletions.
50 changes: 26 additions & 24 deletions blog/column-lineage/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,31 +27,33 @@ In the process of implementing column-level lineage, Paweł and Julien contribut

An example of a `columnLineage` facet in the outputs array of a lineage event:

{
"namespace": "{namespace of the outputdataset}",
"name": "{name of the output dataset}",
"facets": {
"schema": {
"fields": [
{ "name": "{first column of the output dataset}", "type": "{its type}"},
{ "name": "{second column of the output dataset}", "type": "{its type}"},
...
]
},
"columnLineage": {
"{first column of the output dataset}": {
"inputFields": [
{ "namespace": "{input dataset namespace}", name: "{input dataset name}", "field": "{input dataset column name}"},
... other inputs
],
"transformationDescription": "identical",
"transformationType": "IDENTITY"
},
"{second column of the output dataset}": ...,
...
}
}
```json
{
"namespace": "{namespace of the outputdataset}",
"name": "{name of the output dataset}",
"facets": {
"schema": {
"fields": [
{ "name": "{first column of the output dataset}", "type": "{its type}"},
{ "name": "{second column of the output dataset}", "type": "{its type}"},
...
]
},
"columnLineage": {
"{first column of the output dataset}": {
"inputFields": [
{ "namespace": "{input dataset namespace}", name: "{input dataset name}", "field": "{input dataset column name}"},
... other inputs
],
"transformationDescription": "identical",
"transformationType": "IDENTITY"
},
"{second column of the output dataset}": ...,
...
}
}
}
```

### How it works

Expand Down
3 changes: 2 additions & 1 deletion docs/client/java/partials/java_transport.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ spark.openlineage.transport.headers.X-Some-Extra-Header=abc
spark.openlineage.transport.compression=gzip
```

<details><summary>URL parsing within Spark integration</summary>
<details>
<summary>URL parsing within Spark integration</summary>
<p>

You can supply http parameters using values in url, the parsed `spark.openlineage.*` properties are located in url as follows:
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Apache Airflow
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/default-extractors.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Exposing Lineage in Airflow Operators
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/extractors/custom-extractors.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Custom Extractors
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/extractors/extractor-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Testing Custom Extractors
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/job-hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Job Hierarchy
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/airflow/manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Manually Annotated Lineage
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand Down
6 changes: 3 additions & 3 deletions docs/integrations/airflow/older.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Supported Airflow versions
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand All @@ -16,7 +16,7 @@ while the `openlineage-airflow` will primarily be updated for bug fixes.
##### Airflow 2.7+

This package **should not** be used starting with Airflow 2.7.0 and **can not** be used with Airflow 2.8+.
It was designed as Airflow's external integration that works mainly for Airflow versions <2.7.
It was designed as Airflow's external integration that works mainly for Airflow versions \<2.7.
For Airflow 2.7+ use the native Airflow OpenLineage provider
[package](https://airflow.apache.org/docs/apache-airflow-providers-openlineage) `apache-airflow-providers-openlineage`.

Expand Down Expand Up @@ -44,6 +44,6 @@ openlineage.lineage_backend.OpenLineageBackend

The OpenLineageBackend does not take into account manually configured inlets and outlets.

##### Airflow <2.1
##### Airflow \<2.1

OpenLineage does not work with versions older than Airflow 2.1.
4 changes: 2 additions & 2 deletions docs/integrations/airflow/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Using the Airflow Integration
---

:::caution
This page is about Airflow's external integration that works mainly for Airflow versions <2.7.
This page is about Airflow's external integration that works mainly for Airflow versions \<2.7.
[If you're using Airflow 2.7+, look at native Airflow OpenLineage provider documentation.](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html) <br /><br />

The ongoing development and enhancements will be focused on the `apache-airflow-providers-openlineage` package,
Expand All @@ -14,7 +14,7 @@ while the `openlineage-airflow` will primarily be updated for bug fixes. See [al
#### PREREQUISITES

- [Python 3.8](https://www.python.org/downloads)
- [Airflow >= 2.1,<2.8](https://pypi.org/project/apache-airflow)
- [Airflow >= 2.1,\<2.8](https://pypi.org/project/apache-airflow)

To use the OpenLineage Airflow integration, you'll need a running [Airflow instance](https://airflow.apache.org/docs/apache-airflow/stable/start.html). You'll also need an OpenLineage-compatible [backend](https://github.com/OpenLineage/OpenLineage#scope).

Expand Down
10 changes: 5 additions & 5 deletions docs/integrations/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,11 @@ and allows all the configuration features present there to be used. The configur

The following parameters can be specified:

| Parameter | Definition | Example |
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[some_facet1;some_facet1\] |
| openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |
| Parameter | Definition | Example |
|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[some_facet1;some_facet1\] |
| openlineage.job.owners.\<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |

## Transports

Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/spark/configuration/spark_conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ The following parameters can be specified:
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| spark.openlineage.job.owners.\<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| spark.openlineage.columnLineage.datasetLineageEnabled | Makes the dataset dependencies to be included in their own property `dataset` in the column lineage pattern. If this flag is set to `false`, then the dataset dependencies are merged into `fields` property. The default value is `false`. **It is recommended to set it to `true`** | true |
Loading

0 comments on commit 5fb7f9c

Please sign in to comment.