Skip to content

Commit

Permalink
Merge pull request #2454 from MicrosoftDocs/main638683064043714434syn…
Browse files Browse the repository at this point in the history
…c_temp

For protected branch, push strategy should use PR and merge to target branch method to work around git push error
  • Loading branch information
learn-build-service-prod[bot] authored Nov 27, 2024
2 parents f41e17e + b8ec0fd commit c2d782f
Show file tree
Hide file tree
Showing 4 changed files with 150 additions and 108 deletions.
77 changes: 51 additions & 26 deletions data-explorer/kusto/management/data-ingestion/ingest-from-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Kusto query ingestion (set, append, replace)
description: Learn how to use the .set, .append, .set-or-append, and .set-or-replace commands to ingest data from a query.
ms.reviewer: orspodek
ms.topic: reference
ms.date: 08/11/2024
ms.date: 11/24/2024
---
# Ingest from query (.set, .append, .set-or-append, .set-or-replace)

Expand All @@ -13,19 +13,25 @@ These commands execute a query or a management command and ingest the results of

|Command |If table exists |If table doesn't exist |
|-----------------|------------------------------------|------------------------------------------|
|`.set` |The command fails |The table is created and data is ingested|
|`.append` |Data is appended to the table |The command fails |
|`.set-or-append` |Data is appended to the table |The table is created and data is ingested|
|`.set-or-replace`|Data replaces the data in the table|The table is created and data is ingested|
|`.set` |The command fails. |The table is created and data is ingested.|
|`.append` |Data is appended to the table. |The command fails.|
|`.set-or-append` |Data is appended to the table. |The table is created and data is ingested.|
|`.set-or-replace`|Data replaces the data in the table.|The table is created and data is ingested.|

To cancel an ingest from query command, see [`cancel operation`](../cancel-operation-command.md).

::: moniker range="azure-data-explorer"
> [!NOTE]
> Ingest from query is a [direct ingestion](/azure/data-explorer/ingest-data-overview#direct-ingestion-with-management-commands). As such, it does not include automatic retries. Automatic retries are available when ingesting through the data management service. Use the [ingestion overview](/azure/data-explorer/ingest-data-overview) document to decide which is the most suitable ingestion option for your scenario.
::: moniker-end
::: moniker range="microsoft-fabric"
> [!NOTE]
> Ingest from query is a [direct ingestion](/azure/data-explorer/ingest-data-overview#direct-ingestion-with-management-commands). As such, it does not include automatic retries. Automatic retries are available when ingesting through the data management service.
::: moniker-end

## Permissions

To perform different actions on a table, specific permissions are required:
To perform different actions on a table, you need specific permissions:

* To add rows to an existing table using the `.append` command, you need a minimum of Table Ingestor permissions.
* To create a new table using the various `.set` commands, you need a minimum of Database User permissions.
Expand All @@ -50,33 +56,33 @@ For more information on permissions, see [Kusto role-based access control](../..

## Performance tips

* Set the `distributed` property to `true` if the amount of data produced by the query is large, exceeds 1 GB, and doesn't require serialization. Then, multiple nodes can produce output in parallel. Don't use this flag when query results are small, since it might needlessly generate many small data shards.
* Set the `distributed` property to `true` if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn't require serialization. Then, multiple nodes can produce output in parallel. Don't use this flag when query results are small, since it might needlessly generate many small data shards.
* Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
* Limit the data for ingestion to less than 1 GB per ingestion operation. If necessary, use multiple ingestion commands.
* Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.

## Supported ingestion properties

|Property|Type|Description|
|--|--|--|
|`distributed` | `bool` | If `true`, the command ingests from all nodes executing the query in parallel. Default is `false`. See [performance tips](#performance-tips).|
|`creationTime` | `string` | The datetime value, formatted as an ISO8601 string, to use at the creation time of the ingested data extents. If unspecified, `now()` is used. When specified, make sure the `Lookback` property in the target table's effective [Extents merge policy](../merge-policy.md) is aligned with the specified value.|
|`extend_schema` | `bool` | If `true`, the command may extend the schema of the table. Default is `false`. This option applies only to `.append`, `.set-or-append`, and `set-or-replace` commands. This option requires at least [Table Admin](../../access-control/role-based-access-control.md) permissions.|
|`recreate_schema` | `bool` | If `true`, the command may recreate the schema of the table. Default is `false`. This option applies only to the `.set-or-replace` command. This option takes precedence over the `extend_schema` property if both are set. This option requires at least [Table Admin](../../access-control/role-based-access-control.md) permissions.|
|`creationTime` | `string` | The `datetime` value, formatted as an ISO8601 `string`, to use at the creation time of the ingested data extents. If unspecified, `now()` is used. When specified, make sure the `Lookback` property in the target table's effective [Extents merge policy](../merge-policy.md) is aligned with the specified value.|
|`extend_schema` | `bool` | If `true`, the command might extend the schema of the table. Default is `false`. This option applies only to `.append`, `.set-or-append`, and `set-or-replace` commands. This option requires at least [Table Admin](../../access-control/role-based-access-control.md) permissions.|
|`recreate_schema` | `bool` | If `true`, the command might recreate the schema of the table. Default is `false`. This option applies only to the `.set-or-replace` command. This option takes precedence over the `extend_schema` property if both are set. This option requires at least [Table Admin](../../access-control/role-based-access-control.md) permissions.|
|`folder` | `string` | The folder to assign to the table. If the table already exists, this property overwrites the table's folder.|
|`ingestIfNotExists` | `string` | If specified, ingestion fails if the table already has data tagged with an `ingest-by:` tag with the same value. For more information, see [ingest-by: tags](../extent-tags.md).|
|`policy_ingestiontime` | `bool` | If `true`, the [Ingestion Time Policy](../show-table-ingestion-time-policy-command.md) will be enabled on the table. The default is `true`.|
|`tags` | `string` | A JSON string that represents a list of [tags](../extent-tags.md) to associate with the created extent. |
|`policy_ingestiontime` | `bool` | If `true`, the [Ingestion Time Policy](../show-table-ingestion-time-policy-command.md) is enabled on the table. The default is `true`.|
|`tags` | `string` | A JSON `string` that represents a list of [tags](../extent-tags.md) to associate with the created extent. |
|`docstring` | `string` | A description used to document the table.|
|`persistDetails` |A Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the [.show operation details](../show-operations.md) command. Defaults to `false`. |`with (persistDetails=true)`|

## Schema considerations

* `.set-or-replace` preserves the schema unless one of `extend_schema` or `recreate_schema` ingestion properties is set to `true`.
* `.set-or-append` and `.append` commands preserve the schema unless the `extend_schema` ingestion property is set to `true`.
* Matching the result set schema to that of the target table is based on the column types. There's no matching of column names. Make sure that the query result schema columns are in the same order as the table, else data will be ingested into the wrong columns.
* Matching the result set schema to that of the target table is based on the column types. There's no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.

> [!CAUTION]
> If the schema is modified, it happens in a separate transaction before the actual data ingestion. This means the schema may be modified even when there is a failure to ingest the data.
> If the schema is modified, it happens in a separate transaction before the actual data ingestion. This means the schema might be modified even when there is a failure to ingest the data.
## Character limitation

Expand All @@ -88,17 +94,25 @@ For example, in the following query, the `search` operator generates a column `$
.set Texas <| search State has 'Texas' | project-rename tableName=$table
```

## Returns

Returns information on the extents created because of the `.set` or `.append` command.

## Examples

Create a new table called :::no-loc text="RecentErrors"::: in the database that has the same schema as :::no-loc text="LogsTable"::: and holds all the error records of the last hour.
### Create and update table from query source

The following query creates the *:::no-loc text="RecentErrors":::* table with the same schema as *:::no-loc text="LogsTable":::*. It updates *:::no-loc text="RecentErrors":::* with all error logs from *:::no-loc text="LogsTable":::* over the last hour.

```kusto
.set RecentErrors <|
LogsTable
| where Level == "Error" and Timestamp > now() - time(1h)
```

Create a new table called "OldExtents" in the database that has a single column, "ExtentId", and holds the extent IDs of all extents in the database that were created more than 30 days ago. The database has an existing table named "MyExtents". Since the dataset is expected to be bigger than 1 GB (more than ~1 million rows) use the *distributed* flag
### Create and update table from query source using the *distributed* flag

The following example creates a new table called `OldExtents` in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the *distributed* flag is used. It updates `OldExtents` with `ExtentId` entries from the `MyExtents` table that were created more than 30 days ago.

```kusto
.set async OldExtents with(distributed=true) <|
Expand All @@ -107,8 +121,9 @@ Create a new table called "OldExtents" in the database that has a single column,
| project ExtentId
```

Append data to an existing table called "OldExtents" in the current database that has a single column, "ExtentId", and holds the extent IDs of all extents in the database that have been created more than 30 days earlier.
Mark the new extent with tags `tagA` and `tagB`, based on an existing table named "MyExtents".
### Append data to table

The following example filters `ExtentId` entries in the `MyExtents` table that were created more than 30 days ago and appends the entries to the `OldExtents` table with associated tags.

```kusto
.append OldExtents with(tags='["TagA","TagB"]') <|
Expand All @@ -117,7 +132,9 @@ Mark the new extent with tags `tagA` and `tagB`, based on an existing table name
| project ExtentId
```

Append data to the "OldExtents" table in the current database, or create the table if it doesn't already exist. Tag the new extent with `ingest-by:myTag`. Do so only if the table doesn't already contain an extent tagged with `ingest-by:myTag`, based on an existing table named "MyExtents".
### Create or append a table with possibly existing tagged data

The following example either appends to or creates the `OldExtents` table asynchronously. It filters `ExtentId` entries in the `MyExtents` table that were created more than 30 days ago and specifies the tags to append to the new extents with `ingest-by:myTag`. The `ingestIfNotExists` parameter ensures that the ingestion only occurs if the data doesn't already exist in the table with the specified tag.

```kusto
.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
Expand All @@ -126,7 +143,9 @@ Append data to the "OldExtents" table in the current database, or create the tab
| project ExtentId
```

Replace the data in the "OldExtents" table in the current database, or create the table if it doesn't already exist. Tag the new extent with `ingest-by:myTag`.
### Create table or replace data with associated data

The following query replaces the data in the `OldExtents` table, or creates the table if it doesn't already exist, with `ExtentId` entries in the `MyExtents` table that were created more than 30 days ago. Tag the new extent with `ingest-by:myTag` if the data doesn't already exist in the table with the specified tag.

```kusto
.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
Expand All @@ -135,7 +154,9 @@ Replace the data in the "OldExtents" table in the current database, or create th
| project ExtentId
```

Append data to the "OldExtents" table in the current database, while setting the extents creation time to a specific datetime in the past.
### Append data with associated data

The following example appends data to the `OldExtents` table asynchronously, using `ExtentId` entries from the `MyExtents` table that were created more than 30 days ago. It sets a specific creation time for the new extents.

```kusto
.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <|
Expand All @@ -144,12 +165,16 @@ Append data to the "OldExtents" table in the current database, while setting the
| project ExtentId
```

**Return output**
**Sample output**

Returns information on the extents created because of the `.set` or `.append` command.

**Example output**
The following is a sample of the type of output you may see from your queries.

|ExtentId |OriginalSize |ExtentSize |CompressedSize |IndexSize |RowCount |
|--|--|--|--|--|--|
|23a05ed6-376d-4119-b1fc-6493bcb05563 |1291 |5882 |1568 |4314 |10 |

## Related content

* [Data formats supported for ingestion](../../ingestion-supported-formats.md)
* [Inline ingestion](ingest-inline.md)
* [Ingest from storage](ingest-from-storage.md)
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Kusto.ingest into command (pull data from storage)
description: This article describes The .ingest into command (pull data from storage).
ms.reviewer: orspodek
ms.topic: reference
ms.date: 08/11/2024
ms.date: 11/25/2024
---
# Ingest from storage

Expand All @@ -12,7 +12,7 @@ ms.date: 08/11/2024
The `.ingest into` command ingests data into a table by "pulling" the data
from one or more cloud storage files.
For example, the command
can retrieve 1000 CSV-formatted blobs from Azure Blob Storage, parse
can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse
them, and ingest them together into a single target table.
Data is appended to the table
without affecting existing records, and without modifying the table's schema.
Expand Down Expand Up @@ -44,7 +44,7 @@ You must have at least [Table Ingestor](../../access-control/role-based-access-c

## Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the ingestion.
Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Expand All @@ -58,17 +58,15 @@ The following table lists the supported authentication methods and the permissio

## Returns

The result of the command is a table with as many records
as there are data shards ("extents") generated by the command.
If no data shards have been generated, a single record is returned
with an empty (zero-valued) extent ID.
The result of the command is a table with as many records as there are data shards ("extents") generated by the command.
If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.

|Name |Type |Description |
|-----------|----------|---------------------------------------------------------------------------|
|Name |Type |Description |
|-----------|----------|-----------------------------------------------------------|
|ExtentId |`guid` |The unique identifier for the data shard that was generated by the command.|
|ItemLoaded |`string` |One or more storage files that are related to this record. |
|ItemLoaded |`string` |One or more storage files that are related to this record.|
|Duration |`timespan`|How long it took to perform ingestion. |
|HasErrors |`bool` |Whether this record represents an ingestion failure or not. |
|HasErrors |`bool` |Whether or not this record represents an ingestion failure.|
|OperationId|`guid` |A unique ID representing the operation. Can be used with the `.show operation` command.|

>[!NOTE]
Expand All @@ -78,7 +76,7 @@ with an empty (zero-valued) extent ID.

### Azure Blob Storage with shared access signature

The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table `T`. The `...` represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Note also the use of obfuscated strings (the `h` in front of the string values) to ensure that the SAS is never recorded.
The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table `T`. The `...` represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the `h` in front of the string values) are used to ensure that the SAS is never recorded.

```kusto
.ingest into table T (
Expand All @@ -89,7 +87,7 @@ The following example instructs your database to read two blobs from Azure Blob

### Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table `T` using managed identity authentication. For additional information on managed identity authentication method, see [Managed Identity Authentication Overview](../../api/connection-strings/storage-connection-strings.md#managed-identity).
The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table `T` using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see [Create a managed identity for storage containers](/azure/ai-services/language-service/native-document-support/managed-identities).

```kusto
.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')
Expand Down Expand Up @@ -137,3 +135,11 @@ The following example ingests a single file from Amazon S3 using a [preSigned UR
.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/file.csv?<<pre signed string>>')
with (format='csv')
```

## Related content

* [Data formats supported for ingestion](../../ingestion-supported-formats.md)
* [Inline ingestion](ingest-inline.md)
* [Ingest from query (.set, .append, .set-or-append, .set-or-replace)](ingest-from-query.md)
* [.show ingestion failures command](../ingestion-failures.md)
* [.show ingestion mapping](../show-ingestion-mapping-command.md)
Loading

0 comments on commit c2d782f

Please sign in to comment.