Revision - 3dd3a62 - [SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup [...]

Revision 3dd3a623f293bc7fd4937c95f06b967fa187b0f1 authored by Nicholas Chammas on 04 December 2019, 01:10:30 UTC, committed by HyukjinKwon on 04 December 2019, 01:10:30 UTC

[SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader

### What changes were proposed in this pull request?

As a follow-up to #24830, this PR adds the `recursiveFileLookup` option to the Python DataFrameReader API.

### Why are the changes needed?

This PR maintains Python feature parity with Scala.

### Does this PR introduce any user-facing change?

Yes.

Before this PR, you'd only be able to use this option as follows:

```python
spark.read.option("recursiveFileLookup", True).text("test-data").show()
```

With this PR, you can reference the option from within the format-specific method:

```python
spark.read.text("test-data", recursiveFileLookup=True).show()
```

This option now also shows up in the Python API docs.

### How was this patch tested?

I tested this manually by creating the following directories with dummy data:

```
test-data
├── 1.txt
└── nested
   └── 2.txt
test-parquet
├── nested
│  ├── _SUCCESS
│  ├── part-00000-...-.parquet
├── _SUCCESS
├── part-00000-...-.parquet
```

I then ran the following tests and confirmed the output looked good:

```python
spark.read.parquet("test-parquet", recursiveFileLookup=True).show()
spark.read.text("test-data", recursiveFileLookup=True).show()
spark.read.csv("test-data", recursiveFileLookup=True).show()
```

`python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm happy to add my tests there, though it seems we have been deferring testing like this to the Scala side of things.

Closes #26718 from nchammas/SPARK-27990-recursiveFileLookup-python.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>

1 parent f3abee3

Files
Changes

Permalinks

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graph
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.4 KB
CONTRIBUTING.md	-rw-r--r--	997 bytes
LICENSE	-rw-r--r--	13.1 KB
LICENSE-binary	-rw-r--r--	22.8 KB
NOTICE	-rw-r--r--	2.0 KB
NOTICE-binary	-rw-r--r--	56.3 KB
README.md	-rw-r--r--	4.6 KB
appveyor.yml	-rw-r--r--	3.0 KB
pom.xml	-rw-r--r--	111.9 KB
scalastyle-config.xml	-rw-r--r--	19.1 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

[SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader

README.md