partition operator - Azure Data Explorer & Real-Time Intelligence (2024)

  • Article

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.

This operator is useful when you need to perform a subquery only on a subset of rows that belongs to the same partition key, and not query the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T | partition [ hint.strategy=Strategy ] [ Hints ] by Column ( TransformationSubQuery )

T | partition [ hint.strategy=legacy ] [ Hints ] by Column { SubQueryWithSource }

Learn more about syntax conventions.

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular source.
StrategystringThe value legacy, shuffle, or native. This hint defines the execution strategy of the partition operator.

If no strategy is specified, the legacy strategy is used. For more information, see Strategies.

Columnstring✔️The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuerystring✔️A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is hom*ogenous on the value of Column.

The expression must provide only one tabular result and shouldn't have other types of statements, such as let statements.

SubQueryWithSourcestring✔️A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

The expression must provide only one tabular result and shouldn't have other types of statements, such as let statements.

HintsstringZero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint nameTypeStrategyDescription
hint.shufflekeystringshuffleThe partition key used to run the partition operator with the shuffle strategy.
hint.materializedboollegacyIf set to true, will materialize the source of the partition operator. The default value is false.
hint.concurrencyintlegacyDetermines how many partitions to run in parallel. The default value is 16.
hint.spreadintlegacyDetermines how to distribute the partitions among cluster nodes. The default value is 1.

For example, if there are N partitions and the spread hint is set to P, then the N partitions will be processed by P different cluster nodes equally in parallel/sequentially depending on the concurrency hint.

Returns

The operator returns a union of the results of the individual subqueries.

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Note

The distinction between the native and shuffle strategies allows the caller to indicate the cardinality and execution strategy of the subquery. This choice may affect how long the subquery takes to complete but doesn't change the end result.

Native strategy

This strategy should be applied when the number of distinct values of the partition key isn't large, roughly in the thousands.

The subquery must be a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There's no restriction on the number of partitions.

To use this strategy, specify hint.strategy=native.

Shuffle strategy

This strategy should be applied when the number of distinct values of the partition key is large, in the millions.

The subquery must be a tabular transformation that doesn't specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There's no restriction on the number of partitions.

To use this strategy, specify hint.strategy=shuffle. For more information about shuffle strategy and performance, see shuffle query.

Supported operators for the native and shuffle strategies

The following list of operators can be used in subqueries with the native or shuffle strategies:

  • count
  • distinct
  • extend
  • make-series
  • mv-apply
  • mv-expand
  • parse
  • parse-where
  • project
  • project-away
  • project-keep
  • project-rename
  • project-reorder
  • reduce
  • sample
  • sample-distinct
  • scan
  • search
  • serialize
  • sort
  • summarize
  • take
  • top
  • top-hitters
  • top-nested
  • where

Note

  • Operators that reference a table source other than the subtable partitions aren't compatible with the native and shuffle strategies. For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy.
  • The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.

Legacy strategy

For historical reasons, the legacy strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

In some scenarios, the legacy strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.

To use this strategy, specify hint.strategy=legacy or omit any other strategy indication.

Note

An error will occur if the partition column, Column, contains more than 64 distinct values.

Examples

Find top values

In some cases, it's more performant and easier to write a query using the partition operator than using the top-nested operator. The following query runs a subquery calculating summarize and top for each State starting with W: "WYOMING", "WASHINGTON", "WEST VIRGINIA", and "WISCONSIN".

StormEvents| where State startswith 'W'| partition hint.strategy=native by State ( summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State | top 3 by Events ) 

Output

EventTypeStateEventsInjuries
HailWYOMING1080
High WindWYOMING815
Winter StormWYOMING720
Heavy SnowWASHINGTON820
High WindWASHINGTON5813
WildfireWASHINGTON290
Thunderstorm WindWEST VIRGINIA1801
HailWEST VIRGINIA1030
Winter WeatherWEST VIRGINIA880
Thunderstorm WindWISCONSIN4161
Winter StormWISCONSIN3100
HailWISCONSIN3031

Native strategy

The following query returns the top 2 EventType values by TotalInjuries for each State that starts with 'W':

StormEvents| where State startswith 'W'| partition hint.strategy = native by State ( summarize TotalInjueries = sum(InjuriesDirect) by EventType | top 2 by TotalInjueries )

Output

EventTypeTotalInjueries
Tornado4
Hail1
Thunderstorm Wind1
Excessive Heat0
High Wind13
Lightning5
High Wind5
Avalanche3

Shuffle strategy

The following query returns the top 3 DamagedProperty values foreach EpisodeId and the columns EpisodeId and State.

StormEvents| partition hint.strategy=shuffle by EpisodeId ( top 3 by DamageProperty | project EpisodeId, State, DamageProperty )| count

Output

Count
22345

Legacy strategy with explicit source

The following query runs two subqueries:

  • When x == 1, the query returns all rows from StormEvents that have InjuriesIndirect == 1.
  • When x == 2, the query returns all rows from StormEvents that have InjuriesIndirect == 2.

The final result is the union of these two subqueries.

range x from 1 to 2 step 1| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

T| partition by Dim( as Partition | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric)))
partition operator - Azure Data Explorer & Real-Time Intelligence (2024)

References

Top Articles
Latest Posts
Article information

Author: Prof. An Powlowski

Last Updated:

Views: 6138

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Prof. An Powlowski

Birthday: 1992-09-29

Address: Apt. 994 8891 Orval Hill, Brittnyburgh, AZ 41023-0398

Phone: +26417467956738

Job: District Marketing Strategist

Hobby: Embroidery, Bodybuilding, Motor sports, Amateur radio, Wood carving, Whittling, Air sports

Introduction: My name is Prof. An Powlowski, I am a charming, helpful, attractive, good, graceful, thoughtful, vast person who loves writing and wants to share my knowledge and understanding with you.