Revision d7b268ab3264b892c4477cf8af30fb78c2694748 authored by herman on 03 December 2019, 10:25:49 UTC, committed by herman on 03 December 2019, 10:25:49 UTC
### What changes were proposed in this pull request?
Observable metrics are named arbitrary aggregate functions that can be defined on a query (Dataframe). As soon as the execution of a Dataframe reaches a completion point (e.g. finishes batch query or reaches streaming epoch) a named event is emitted that contains the metrics for the data processed since the last completion point.

A user can observe these metrics by attaching a listener to spark session, it depends on the execution mode which listener to attach:
- Batch: `QueryExecutionListener`. This will be called when the query completes. A user can access the metrics by using the `QueryExecution.observedMetrics` map.
- (Micro-batch) Streaming: `StreamingQueryListener`. This will be called when the streaming query completes an epoch. A user can access the metrics by using the `StreamingQueryProgress.observedMetrics` map. Please note that we currently do not support continuous execution streaming.

### Why are the changes needed?
This enabled observable metrics.

### Does this PR introduce any user-facing change?
Yes. It adds the `observe` method to `Dataset`.

### How was this patch tested?
- Added unit tests for the `CollectMetrics` logical node to the `AnalysisSuite`.
- Added unit tests for `StreamingProgress` JSON serialization to the `StreamingQueryStatusAndProgressSuite`.
- Added integration tests for streaming to the `StreamingQueryListenerSuite`.
- Added integration tests for batch to the `DataFrameCallbackSuite`.

Closes #26127 from hvanhovell/SPARK-29348.

Authored-by: herman <herman@databricks.com>
Signed-off-by: herman <herman@databricks.com>
1 parent 075ae1e
History
File Mode Size
LICENSE-AnchorJS.txt -rw-r--r-- 1.1 KB
LICENSE-CC0.txt -rw-r--r-- 6.9 KB
LICENSE-bootstrap.txt -rw-r--r-- 550 bytes
LICENSE-cloudpickle.txt -rw-r--r-- 1.6 KB
LICENSE-copybutton.txt -rw-r--r-- 2.4 KB
LICENSE-d3.min.js.txt -rw-r--r-- 1.4 KB
LICENSE-dagre-d3.txt -rw-r--r-- 1.0 KB
LICENSE-datatables.txt -rw-r--r-- 1.0 KB
LICENSE-graphlib-dot.txt -rw-r--r-- 1.0 KB
LICENSE-heapq.txt -rw-r--r-- 2.4 KB
LICENSE-join.txt -rw-r--r-- 1.5 KB
LICENSE-jquery.txt -rw-r--r-- 1.1 KB
LICENSE-json-formatter.txt -rw-r--r-- 547 bytes
LICENSE-matchMedia-polyfill.txt -rw-r--r-- 149 bytes
LICENSE-modernizr.txt -rw-r--r-- 1.1 KB
LICENSE-mustache.txt -rw-r--r-- 1.2 KB
LICENSE-py4j.txt -rw-r--r-- 1.4 KB
LICENSE-respond.txt -rw-r--r-- 1.0 KB
LICENSE-sbt-launch-lib.txt -rw-r--r-- 1.5 KB
LICENSE-sorttable.js.txt -rw-r--r-- 937 bytes
LICENSE-vis.txt -rw-r--r-- 406 bytes

back to top