Revision 22dae38b5e27e783732a37b537c80b1202adc288 authored by Bruce Robbins on 23 June 2022, 00:07:43 UTC, committed by Yuming Wang on 23 June 2022, 00:46:53 UTC
Change `LimitPushDownThroughWindow` so that it no longer supports pushing down a limit through a window using percent_rank.

Given a query with a limit of _n_ rows, and a window whose child produces _m_ rows, percent_rank will label the _nth_ row as 100% rather than the _mth_ row.

This behavior conflicts with Spark 3.1.3, Hive 2.3.9 and Prestodb 0.268.

Assume this data:
```
create table t1 stored as parquet as
select *
from range(101);
```
And also assume this query:
```
select id, percent_rank() over (order by id) as pr
from t1
limit 3;
```
With Spark 3.2.1, 3.3.0, and master, the limit is applied before the percent_rank:
```
0	0.0
1	0.5
2	1.0
```
With Spark 3.1.3, and Hive 2.3.9, and Prestodb 0.268, the limit is applied _after_ percent_rank:

Spark 3.1.3:
```
0	0.0
1	0.01
2	0.02
```
Hive 2.3.9:
```
0: jdbc:hive2://localhost:10000> select id, percent_rank() over (order by id) as pr
from t1
limit 3;
. . . . . . . . . . . . . . . .> . . . . . . . . . . . . . . . .> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+-----+-------+
| id  |  pr   |
+-----+-------+
| 0   | 0.0   |
| 1   | 0.01  |
| 2   | 0.02  |
+-----+-------+
3 rows selected (4.621 seconds)
0: jdbc:hive2://localhost:10000>
```

Prestodb 0.268:
```
 id |  pr
----+------
  0 |  0.0
  1 | 0.01
  2 | 0.02
(3 rows)
```
With this PR, Spark will apply the limit after percent_rank.

No (besides changing percent_rank's behavior to be more like Spark 3.1.3, Hive, and Prestodb).

New unit tests.

Closes #36951 from bersprockets/percent_rank_issue.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit a63ce5676e79f15903e9fd533a26a6c3ec9bf7a8)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
1 parent 497d17f
History

README.md

back to top