Revision d7b268ab3264b892c4477cf8af30fb78c2694748 authored by herman on 03 December 2019, 10:25:49 UTC, committed by herman on 03 December 2019, 10:25:49 UTC
### What changes were proposed in this pull request? Observable metrics are named arbitrary aggregate functions that can be defined on a query (Dataframe). As soon as the execution of a Dataframe reaches a completion point (e.g. finishes batch query or reaches streaming epoch) a named event is emitted that contains the metrics for the data processed since the last completion point. A user can observe these metrics by attaching a listener to spark session, it depends on the execution mode which listener to attach: - Batch: `QueryExecutionListener`. This will be called when the query completes. A user can access the metrics by using the `QueryExecution.observedMetrics` map. - (Micro-batch) Streaming: `StreamingQueryListener`. This will be called when the streaming query completes an epoch. A user can access the metrics by using the `StreamingQueryProgress.observedMetrics` map. Please note that we currently do not support continuous execution streaming. ### Why are the changes needed? This enabled observable metrics. ### Does this PR introduce any user-facing change? Yes. It adds the `observe` method to `Dataset`. ### How was this patch tested? - Added unit tests for the `CollectMetrics` logical node to the `AnalysisSuite`. - Added unit tests for `StreamingProgress` JSON serialization to the `StreamingQueryStatusAndProgressSuite`. - Added integration tests for streaming to the `StreamingQueryListenerSuite`. - Added integration tests for batch to the `DataFrameCallbackSuite`. Closes #26127 from hvanhovell/SPARK-29348. Authored-by: herman <herman@databricks.com> Signed-off-by: herman <herman@databricks.com>
1 parent 075ae1e
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graph | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.gitattributes | -rw-r--r-- | 40 bytes |
.gitignore | -rw-r--r-- | 1.4 KB |
CONTRIBUTING.md | -rw-r--r-- | 997 bytes |
LICENSE | -rw-r--r-- | 13.1 KB |
LICENSE-binary | -rw-r--r-- | 22.8 KB |
NOTICE | -rw-r--r-- | 2.0 KB |
NOTICE-binary | -rw-r--r-- | 56.3 KB |
README.md | -rw-r--r-- | 4.6 KB |
appveyor.yml | -rw-r--r-- | 3.0 KB |
pom.xml | -rw-r--r-- | 111.6 KB |
scalastyle-config.xml | -rw-r--r-- | 19.1 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...