Revision be7f1e9979c38b1358b0af2b358bacb0bd523c80 authored by Dongjoon Hyun on 24 January 2024, 00:38:45 UTC, committed by Dongjoon Hyun on 24 January 2024, 00:38:53 UTC
### What changes were proposed in this pull request?

This PR aims to fix `spark-daemon.sh` usage by adding `decommission` command.

### Why are the changes needed?

This was missed when SPARK-20628 added `decommission` command at Apache Spark 3.1.0. The command has been used like the following.

https://github.com/apache/spark/blob/0356ac00947282b1a0885ad7eaae1e25e43671fe/sbin/decommission-worker.sh#L41

### Does this PR introduce _any_ user-facing change?

No, this is only a change on usage message.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44856 from dongjoon-hyun/SPARK-46817.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 00a92d328576c39b04cfd0fdd8a30c5a9bc37e36)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent 05f7aa5
Raw File
sql-ref-syntax-qry-select-clusterby.md
---
layout: global
title: CLUSTER BY Clause
displayTitle: CLUSTER BY Clause
license: |
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
---

### Description

The `CLUSTER BY` clause is used to first repartition the data based
on the input expressions and then sort the data within each partition. This is
semantically equivalent to performing a
[DISTRIBUTE BY](sql-ref-syntax-qry-select-distribute-by.html) followed by a
[SORT BY](sql-ref-syntax-qry-select-sortby.html). This clause only ensures that the
resultant rows are sorted within each partition and does not guarantee a total order of output.

### Syntax

```sql
CLUSTER BY { expression [ , ... ] }
```

### Parameters

* **expression**

    Specifies combination of one or more values, operators and SQL functions that results in a value.

### Examples

```sql
CREATE TABLE person (name STRING, age INT);
INSERT INTO person VALUES
    ('Zen Hui', 25),
    ('Anil B', 18),
    ('Shone S', 16),
    ('Mike A', 25),
    ('John A', 18),
    ('Jack N', 16);

-- Reduce the number of shuffle partitions to 2 to illustrate the behavior of `CLUSTER BY`.
-- It's easier to see the clustering and sorting behavior with less number of partitions.
SET spark.sql.shuffle.partitions = 2;

-- Select the rows with no ordering. Please note that without any sort directive, the results
-- of the query is not deterministic. It's included here to show the difference in behavior
-- of a query when `CLUSTER BY` is not used vs when it's used. The query below produces rows
-- where age column is not sorted.
SELECT age, name FROM person;
+---+-------+
|age|   name|
+---+-------+
| 16|Shone S|
| 25|Zen Hui|
| 16| Jack N|
| 25| Mike A|
| 18| John A|
| 18| Anil B|
+---+-------+

-- Produces rows clustered by age. Persons with same age are clustered together.
-- In the query below, persons with age 18 and 25 are in first partition and the
-- persons with age 16 are in the second partition. The rows are sorted based
-- on age within each partition.
SELECT age, name FROM person CLUSTER BY age;
+---+-------+
|age|   name|
+---+-------+
| 18| John A|
| 18| Anil B|
| 25|Zen Hui|
| 25| Mike A|
| 16|Shone S|
| 16| Jack N|
+---+-------+
```

### Related Statements

* [SELECT Main](sql-ref-syntax-qry-select.html)
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
* [OFFSET Clause](sql-ref-syntax-qry-select-offset.html)
* [CASE Clause](sql-ref-syntax-qry-select-case.html)
* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
* [UNPIVOT Clause](sql-ref-syntax-qry-select-unpivot.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
back to top