Revision 7a04def920438ef0e08b66a95befeec981e5571e authored by Xianyang Liu on 07 August 2017, 09:04:53 UTC, committed by Wenchen Fan on 07 August 2017, 09:05:02 UTC
## What changes were proposed in this pull request?

We should reset numRecordsWritten to zero after DiskBlockObjectWriter.commitAndGet called.
Because when `revertPartialWritesAndClose` be called, we decrease the written records in `ShuffleWriteMetrics` . However, we decreased the written records to zero, this should be wrong, we should only decreased the number reords after the last `commitAndGet` called.

## How was this patch tested?
Modified existing test.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Xianyang Liu <xianyang.liu@intel.com>

Closes #18830 from ConeyLiu/DiskBlockObjectWriter.

(cherry picked from commit 534a063f7c693158437d13224f50d4ae789ff6fb)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 098aaec
Raw File
mllib-pmml-model-export.md
---
layout: global
title: PMML model export - RDD-based API
displayTitle: PMML model export - RDD-based API
---

* Table of contents
{:toc}

## `spark.mllib` supported models

`spark.mllib` supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).

The table below outlines the `spark.mllib` models that can be exported to PMML and their equivalent PMML model.

<table class="table">
  <thead>
    <tr><th>`spark.mllib` model</th><th>PMML model</th></tr>
  </thead>
  <tbody>
    <tr>
      <td>KMeansModel</td><td>ClusteringModel</td>
    </tr>    
    <tr>
      <td>LinearRegressionModel</td><td>RegressionModel (functionName="regression")</td>
    </tr>
    <tr>
      <td>RidgeRegressionModel</td><td>RegressionModel (functionName="regression")</td>
    </tr>
    <tr>
      <td>LassoModel</td><td>RegressionModel (functionName="regression")</td>
    </tr>
    <tr>
      <td>SVMModel</td><td>RegressionModel (functionName="classification" normalizationMethod="none")</td>
    </tr>
    <tr>
      <td>Binary LogisticRegressionModel</td><td>RegressionModel (functionName="classification" normalizationMethod="logit")</td>
    </tr>
  </tbody>
</table>

## Examples
<div class="codetabs">

<div data-lang="scala" markdown="1">
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.

As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats.

Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for details on the API.

Here a complete example of building a KMeansModel and print it out in PMML format:
{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}

For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.

</div>

</div>
back to top