Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo sync for protected branch #2438

Merged
merged 19 commits into from
Nov 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data-explorer/kusto/functions-library/log-reduce-fl.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The function runs multiples passes over the rows to be reduced to common pattern
The Logram algorithm considers 3-tuples and 2-tuples of tokens. If a 3-tuple of tokens is common in the log lines (it appears more than *trigram_th* times), then it's likely that all three tokens are part of the pattern. If the 3-tuple is rare, then it's likely that it contains a variable that should be replaced by a wildcard. For rare 3-tuples, we consider the frequency with which 2-tuples contained in the 3-tuple appear. If a 2-tuple is common (it appears more than *bigram_th* times), then the remaining token is likely to be a parameter, and not part of the pattern.\
The Logram algorithm is easy to parallelize. It requires two passes on the log corpus: the first one to count the frequency of each 3-tuple and 2-tuple, and the second one to apply the logic previously described to each entry. To parallelize the algorithm, we only need to partition the log entries, and unify the frequency counts of different workers.

* **Apply [Drain algorithm](https://jiemingzhu.github.io/pub/pjhe_icws2017.pdf)**: this pass is optional, pending *use_drain* is true. Drain is a log parsing algorithm based on a truncated depth prefix tree. Log messages are split according to their length, and for each length the first *tree_depth* tokens of the log message are used to build a prefix tree. If no match for the prefix tokens was found, a new branch is created. If a match for the prefix was found, we search for the most similar pattern among the patterns contained in the tree leaf. Pattern similarity is measured by the ratio of matched nonwildcard tokens out of all tokens. If the similarity of the most similar pattern is above the similarity threshold (the parameter *similarity_th*), then the log entry is matched to the pattern. For that pattern, the function replaces all nonmatching tokens by wildcards. If the similarity of the most similar pattern is below the similarity threshold, a new pattern containing the log entry is created.\
* **Apply [Drain algorithm](https://pinjiahe.github.io/files/pdf/research/ICWS17.pdf)**: this pass is optional, pending *use_drain* is true. Drain is a log parsing algorithm based on a truncated depth prefix tree. Log messages are split according to their length, and for each length the first *tree_depth* tokens of the log message are used to build a prefix tree. If no match for the prefix tokens was found, a new branch is created. If a match for the prefix was found, we search for the most similar pattern among the patterns contained in the tree leaf. Pattern similarity is measured by the ratio of matched nonwildcard tokens out of all tokens. If the similarity of the most similar pattern is above the similarity threshold (the parameter *similarity_th*), then the log entry is matched to the pattern. For that pattern, the function replaces all nonmatching tokens by wildcards. If the similarity of the most similar pattern is below the similarity threshold, a new pattern containing the log entry is created.\
We set default *tree_depth* to 4 based on testing various logs. Increasing this depth can improve runtime but might degrade patterns accuracy; decreasing it's more accurate but slower, as each node performs many more similarity tests.\
Usually, Drain efficiently generalizes and reduces patterns (though it's hard to be parallelized). However, as it relies on a prefix tree, it might not be optimal in log entries containing parameters in the first tokens. This can be resolved in most cases by applying Logram first.

Expand Down
13 changes: 8 additions & 5 deletions data-explorer/kusto/management/cache-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Caching policy (hot and cold cache)
description: This article describes caching policy (hot and cold cache).
ms.reviewer: orspodek
ms.topic: reference
ms.date: 08/11/2024
ms.date: 11/11/2024
---
# Caching policy (hot and cold cache)

Expand All @@ -21,6 +21,9 @@ The best query performance is achieved when all ingested data is cached. However

Use management commands to alter the caching policy at the [database](alter-database-cache-policy-command.md), [table](alter-table-cache-policy-command.md), or [materialized view](alter-materialized-view-cache-policy-command.md) level.

> [!NOTE]
> For information about the consumption rate, see [Eventhouse and KQL database consumption](/fabric/real-time-intelligence/real-time-intelligence-consumption).

::: moniker-end

::: moniker range="azure-data-explorer"
Expand All @@ -35,13 +38,13 @@ Use management commands to alter the caching policy at the [cluster](alter-clust

## How caching policy is applied

When data is ingested, the system keeps track of the date and time of the ingestion, and of the extent that was created. The extent's ingestion date and time value (or maximum value, if an extent was built from multiple pre-existing extents), is used to evaluate the caching policy.
When data is ingested, the system keeps track of the date and time of the ingestion, and of the extent that was created. The extent's ingestion date and time value (or maximum value, if an extent was built from multiple preexisting extents), is used to evaluate the caching policy.

> [!NOTE]
> You can specify a value for the ingestion date and time by using the ingestion property `creationTime`.
> When doing so, make sure the `Lookback` property in the table's effective [Extents merge policy](merge-policy.md) is aligned with the values you set for `creationTime`.

By default, the effective policy is `null`, which means that all the data is considered **hot**. A `null` policy at the table level means that the policy will be inherited from the database. A non-`null` table-level policy overrides a database-level policy.
By default, the effective policy is `null`, which means that all the data is considered **hot**. A `null` policy at the table level means that the policy is inherited from the database. A non-`null` table-level policy overrides a database-level policy.

## Scoping queries to hot cache

Expand All @@ -64,7 +67,7 @@ The `default` value indicates use of the default settings, which determine that

If there's a discrepancy between the different methods, then `set` takes precedence over the client request property. Specifying a value for a table reference takes precedence over both.

For example, in the following query, all table references use hot cache data only, except for the second reference to "T", that is scoped to all the data:
For example, in the following query, all table references use hot cache data only, except for the second reference to "T" that is scoped to all the data:

```kusto
set query_datascope="hotcache";
Expand All @@ -85,7 +88,7 @@ Example:
* `SoftDeletePeriod` = 56d
* `hot cache policy` = 28d

In the example, the last 28 days of data will be on the SSD and the additional 28 days of data will be stored in Azure blob storage. You can run queries on the full 56 days of data.
In the example, the last 28 days of data is stored on the SSD and the additional 28 days of data is stored in Azure blob storage. You can run queries on the full 56 days of data.

## Related content

Expand Down
Loading