https://github.com/apache/spark
Revision 92a71a667dd3e13664015f2a9dd2a39e2c1514eb authored by Josh Rosen on 10 May 2017, 23:50:57 UTC, committed by Xiao Li on 10 May 2017, 23:51:16 UTC
## What changes were proposed in this pull request? There's a latent corner-case bug in PySpark UDF evaluation where executing a `BatchPythonEvaluation` with a single multi-argument UDF where _at least one argument value is repeated_ will crash at execution with a confusing error. This problem was introduced in #12057: the code there has a fast path for handling a "batch UDF evaluation consisting of a single Python UDF", but that branch incorrectly assumes that a single UDF won't have repeated arguments and therefore skips the code for unpacking arguments from the input row (whose schema may not necessarily match the UDF inputs due to de-duplication of repeated arguments which occurred in the JVM before sending UDF inputs to Python). This fix here is simply to remove this special-casing: it turns out that the code in the "multiple UDFs" branch just so happens to work for the single-UDF case because Python treats `(x)` as equivalent to `x`, not as a single-argument tuple. ## How was this patch tested? New regression test in `pyspark.python.sql.tests` module (tested and confirmed that it fails before my fix). Author: Josh Rosen <joshrosen@databricks.com> Closes #17927 from JoshRosen/SPARK-20685. (cherry picked from commit 8ddbc431d8b21d5ee57d3d209a4f25e301f15283) Signed-off-by: Xiao Li <gatorsmile@gmail.com>
1 parent bdc08ab
Tip revision: 92a71a667dd3e13664015f2a9dd2a39e2c1514eb authored by Josh Rosen on 10 May 2017, 23:50:57 UTC
[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ repeated arg.
[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ repeated arg.
Tip revision: 92a71a6
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graphx | ||
launcher | ||
licenses | ||
mesos | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
yarn | ||
.gitattributes | -rw-r--r-- | 40 bytes |
.gitignore | -rw-r--r-- | 1.2 KB |
.travis.yml | -rw-r--r-- | 1.7 KB |
CONTRIBUTING.md | -rw-r--r-- | 995 bytes |
LICENSE | -rw-r--r-- | 17.4 KB |
NOTICE | -rw-r--r-- | 24.1 KB |
README.md | -rw-r--r-- | 3.7 KB |
appveyor.yml | -rw-r--r-- | 1.8 KB |
pom.xml | -rw-r--r-- | 98.5 KB |
scalastyle-config.xml | -rw-r--r-- | 16.7 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...