Revision - 182bc85 - [SPARK-25714] Fix Null Handling in the Optimizer rule [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

18 August 2024, 22:37:34 UTC

Revision 182bc85f2db0b3268b9b93ff91210811b00e1636 authored by gatorsmile on 13 October 2018, 04:02:38 UTC, committed by gatorsmile on 13 October 2018, 04:03:20 UTC

[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification

## What changes were proposed in this pull request?
```Scala
    val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
    df1.write.mode(SaveMode.Overwrite).parquet("/tmp/test1")
    val df2 = spark.read.parquet("/tmp/test1")
    df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
```

Before the PR, it returns both rows. After the fix, it returns `Row ("abc", 1))`. This is to fix the bug in NULL handling in BooleanSimplification. This is a bug introduced in Spark 1.6 release.

## How was this patch tested?
Added test cases

Closes #22702 from gatorsmile/fixBooleanSimplify2.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(cherry picked from commit c9ba59d38e2be17b802156b49d374a726e66c6b9)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>

1 parent 5324a85

Files
Changes

Permalinks

Tip revision: 182bc85f2db0b3268b9b93ff91210811b00e1636 authored by gatorsmile on 13 October 2018, 04:02:38 UTC
[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification

Tip revision: 182bc85

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graphx
hadoop-cloud
launcher
licenses
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.3 KB
.travis.yml	-rw-r--r--	1.7 KB
CONTRIBUTING.md	-rw-r--r--	995 bytes
LICENSE	-rw-r--r--	17.6 KB
NOTICE	-rw-r--r--	25.7 KB
README.md	-rw-r--r--	3.7 KB
appveyor.yml	-rw-r--r--	2.3 KB
pom.xml	-rw-r--r--	99.3 KB
scalastyle-config.xml	-rw-r--r--	17.2 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplification

README.md