https://github.com/apache/spark

sort by:
Revision Author Date Message Commit Date
c69d97c [maven-release-plugin] prepare release v1.0.0-rc11 26 May 2014, 06:46:48 UTC
caed16e Updated CHANGES.txt 26 May 2014, 06:16:25 UTC
6d34a6a Revert "[maven-release-plugin] prepare release v1.0.0-rc11" This reverts commit 2f1dc868e5714882cf40d2633fb66772baf34789. 26 May 2014, 06:10:14 UTC
73ffd1e Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 832dc594e7666f1d402334f8015ce29917d9c888. 26 May 2014, 06:09:20 UTC
18c77cb HOTFIX: Add no-arg SparkContext constructor in Java Self explanatory. Author: Patrick Wendell <pwendell@gmail.com> Closes #878 from pwendell/java-constructor and squashes the following commits: 2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java (cherry picked from commit b6d22af040073cd611b0fcfdf8a5259c0dfd854c) Signed-off-by: Aaron Davidson <aaron@databricks.com> 26 May 2014, 03:13:46 UTC
a3976a2 [SQL] Minor: Introduce SchemaRDD#aggregate() for simple aggregations ```scala rdd.aggregate(Sum('val)) ``` is just shorthand for ```scala rdd.groupBy()(Sum('val)) ``` but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows. Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches. Author: Aaron Davidson <aaron@databricks.com> Closes #874 from aarondav/schemardd and squashes the following commits: e9e68ee [Aaron Davidson] Add comment db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations (cherry picked from commit c3576ffcd7910e38928f233a824dd9e037cde05f) Signed-off-by: Reynold Xin <rxin@apache.org> 26 May 2014, 01:37:52 UTC
5107a6f SPARK-1903 Document Spark's network connections https://issues.apache.org/jira/browse/SPARK-1903 Author: Andrew Ash <andrew@andrewash.com> Closes #856 from ash211/SPARK-1903 and squashes the following commits: 6e7782a [Andrew Ash] Add the technology used on each port 1d9b5d3 [Andrew Ash] Document port for history server 56193ee [Andrew Ash] spark.ui.port becomes worker.ui.port and master.ui.port a774c07 [Andrew Ash] Wording in network section 90e8237 [Andrew Ash] Use real :toc instead of the hand-written one edaa337 [Andrew Ash] Master -> Standalone Cluster Master 57e8869 [Andrew Ash] Port -> Default Port 3d4d289 [Andrew Ash] Title to title case c7d42d9 [Andrew Ash] [WIP] SPARK-1903 Add initial port listing for documentation a416ae9 [Andrew Ash] Word wrap to 100 lines (cherry picked from commit 0659529614c804e0c04efc59cb67dab3a6cdc9d9) Signed-off-by: Reynold Xin <rxin@apache.org> 26 May 2014, 00:15:53 UTC
07f34ca Fix PEP8 violations in Python mllib. Author: Reynold Xin <rxin@apache.org> Closes #871 from rxin/mllib-pep8 and squashes the following commits: 848416f [Reynold Xin] Fixed a typo in the previous cleanup (c -> sc). a8db4cd [Reynold Xin] Fix PEP8 violations in Python mllib. (cherry picked from commit d33d3c61ae9e4551aed0217e525a109e678298f2) Signed-off-by: Reynold Xin <rxin@apache.org> 26 May 2014, 00:15:28 UTC
8891495 Python docstring update for sql.py. Mostly related to the following two rules in PEP8 and PEP257: - Line length < 72 chars. - First line should be a concise description of the function/class. Author: Reynold Xin <rxin@apache.org> Closes #869 from rxin/docstring-schemardd and squashes the following commits: 7cf0cbc [Reynold Xin] Updated sql.py for pep8 docstring. 0a4aef9 [Reynold Xin] Merge branch 'master' into docstring-schemardd 6678937 [Reynold Xin] Python docstring update for sql.py. (cherry picked from commit 14f0358b2a0a9b92526bdad6d501ab753459eaa0) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 23:04:23 UTC
3368397 Fix PEP8 violations in examples/src/main/python. Author: Reynold Xin <rxin@apache.org> Closes #870 from rxin/examples-python-pep8 and squashes the following commits: 2829e84 [Reynold Xin] Fix PEP8 violations in examples/src/main/python. (cherry picked from commit d79c2b28e17ec0b15198aaedd2e1f403d81f717e) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 21:48:41 UTC
832dc59 [maven-release-plugin] prepare for next development iteration 25 May 2014, 10:18:51 UTC
2f1dc86 [maven-release-plugin] prepare release v1.0.0-rc11 25 May 2014, 10:18:41 UTC
7273bfc Added license header for tox.ini. (cherry picked from commit fa541f32c5b92e6868a9c99cbb2c87115d624d23) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 08:50:15 UTC
aeffc20 SPARK-1822: Some minor cleanup work on SchemaRDD.count() Minor cleanup following #841. Author: Reynold Xin <rxin@apache.org> Closes #868 from rxin/schema-count and squashes the following commits: 5442651 [Reynold Xin] SPARK-1822: Some minor cleanup work on SchemaRDD.count() (cherry picked from commit d66642e3978a76977414c2fdaedebaad35662667) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 08:45:01 UTC
291567d Added PEP8 style configuration file. This sets the max line length to 100 as a PEP8 exception. Author: Reynold Xin <rxin@apache.org> Closes #872 from rxin/pep8 and squashes the following commits: 2f26029 [Reynold Xin] Added PEP8 style configuration file. (cherry picked from commit 5c7faecd75ea59454ad3209390ac078e6cf6e4a6) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 08:32:22 UTC
64d0fb5 [SPARK-1822] SchemaRDD.count() should use query optimizer Author: Kan Zhang <kzhang@apache.org> Closes #841 from kanzhang/SPARK-1822 and squashes the following commits: 2f8072a [Kan Zhang] [SPARK-1822] Minor style update cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer (cherry picked from commit 6052db9dc10c996215658485e805200e4f0cf549) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 07:06:57 UTC
7e59335 spark-submit: add exec at the end of the script Add an 'exec' at the end of the spark-submit script, to avoid keeping a bash process hanging around while it runs. This makes ps look a little bit nicer. Author: Colin Patrick Mccabe <cmccabe@cloudera.com> Closes #858 from cmccabe/SPARK-1907 and squashes the following commits: 7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script (cherry picked from commit 6e9fb6320bec3371bc9c010ccbc1b915f500486b) Signed-off-by: Reynold Xin <rxin@apache.org> 25 May 2014, 05:39:34 UTC
b5e9686 [SPARK-1886] check executor id existence when executor exit Author: Zhen Peng <zhenpeng01@baidu.com> Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits: cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit (cherry picked from commit 4e4831b8facc186cda6ef31040ccdeab48acbbb7) Signed-off-by: Aaron Davidson <aaron@databricks.com> 25 May 2014, 03:40:38 UTC
9ff4224 Revert "[maven-release-plugin] prepare release v1.0.0-rc10" This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c. 25 May 2014, 02:23:15 UTC
f856b8c Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8. 25 May 2014, 02:22:54 UTC
8406092 Updated CHANGES.txt 25 May 2014, 02:20:13 UTC
217bd56 SPARK-1911: Emphasize that Spark jars should be built with Java 6. This commit requires the user to manually say "yes" when buiding Spark without Java 6. The prompt can be bypassed with a flag (e.g. if the user is scripting around make-distribution). Author: Patrick Wendell <pwendell@gmail.com> Closes #859 from pwendell/java6 and squashes the following commits: 4921133 [Patrick Wendell] Adding Pyspark Notice fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6. (cherry picked from commit 75a03277704f8618a0f1c41aecfb1ebd24a8ac1a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 25 May 2014, 01:27:13 UTC
12f5ecc [SPARK-1900 / 1918] PySpark on YARN is broken If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly (cherry picked from commit 5081a0a9d47ca31900ea4de570de2cbb0e063105) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 25 May 2014, 01:02:22 UTC
9be103a Update LBFGSSuite.scala the same reason as https://github.com/apache/spark/pull/588 Author: baishuo(白硕) <vc_java@hotmail.com> Closes #815 from baishuo/master and squashes the following commits: 6876c1e [baishuo(白硕)] Update LBFGSSuite.scala (cherry picked from commit a08262d8769808dd3a8ee1b1e80fbf6ac13a557c) Signed-off-by: Reynold Xin <rxin@apache.org> 23 May 2014, 20:02:49 UTC
6541ca2 Updated scripts for auditing releases - Added script to automatically generate change list CHANGES.txt - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive` - Added SBT projects for testing functionality of `spark-sql` and `spark-hive` - Fixed issues in existing tests that might have come up because of changes in Spark 1.0 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #844 from tdas/update-dev-scripts and squashes the following commits: 25090ba [Tathagata Das] Added missing license e2e20b3 [Tathagata Das] Updated tests for auditing releases. (cherry picked from commit b2bdd0e505f1ae3d39c46139f17bd43779ece635) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 23 May 2014, 03:49:25 UTC
c3b4065 [SPARK-1896] Respect spark.master (and --master) before MASTER in spark-shell The hierarchy for configuring the Spark master in the shell is as follows: ``` MASTER > --master > spark.master (spark-defaults.conf) ``` This is inconsistent with the way we run normal applications, which is: ``` --master > spark.master (spark-defaults.conf) > MASTER ``` I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark). Author: Andrew Or <andrewor14@gmail.com> Closes #846 from andrewor14/shell-master and squashes the following commits: 2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL (cherry picked from commit cce77457e00aa5f1f4db3d50454cf257efb156ed) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 23 May 2014, 03:32:43 UTC
23cc40e [SPARK-1897] Respect spark.jars (and --jars) in spark-shell Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way. Author: Andrew Or <andrewor14@gmail.com> Closes #849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell (cherry picked from commit 8edbee7d1b4afc192d97ba192a5526affc464205) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 23 May 2014, 03:25:53 UTC
a566216 Fix UISuite unit test that fails under Jenkins contention Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention (cherry picked from commit f9f5fd5f4e81828a3e0c391892e0f28751568843) Signed-off-by: Reynold Xin <rxin@apache.org> 22 May 2014, 22:11:12 UTC
79cd26c [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn (cherry picked from commit dba314029b4c9d72d7e48a2093b39edd01931f57) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 22 May 2014, 08:53:45 UTC
75af8bd Configuration documentation updates 1. Add < code > to configuration options 2. List env variables in tabular format to be consistent with other pages. 3. Moved Viewing Spark Properties section up. This is against branch-1.0, but should be cherry picked into master as well. Author: Reynold Xin <rxin@apache.org> Closes #851 from rxin/doc-config and squashes the following commits: 28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table. 22 May 2014, 01:49:12 UTC
6e7934e [SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while finding join ke... ...ys. When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used. The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits: fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys. (cherry picked from commit bb88875ad52e8209c25e8350af1fe4b7159086ae) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 22:38:13 UTC
30d1df5 [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark Author: Kan Zhang <kzhang@apache.org> Closes #697 from kanzhang/SPARK-1519 and squashes the following commits: 4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark (cherry picked from commit f18fd05b513b136363c94adb3e5b841f8bf48134) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 20:27:07 UTC
9b8f772 [Typo] Stoped -> Stopped Author: Andrew Or <andrewor14@gmail.com> Closes #847 from andrewor14/yarn-typo and squashes the following commits: c1906af [Andrew Or] Stoped -> Stopped (cherry picked from commit ba5d4a99425a2083fea2a9759050c5e770197e23) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 18:59:29 UTC
bc6bbfa [Minor] Move JdbcRDDSuite to the correct package It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package (cherry picked from commit 7c79ef7d43de258ad9a5de15c590132bd78ce8dd) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 08:25:38 UTC
7295dd9 [Docs] Correct example of creating a new SparkConf The example code on the configuration page currently does not compile. Author: Andrew Or <andrewor14@gmail.com> Closes #842 from andrewor14/conf-docs and squashes the following commits: aabff57 [Andrew Or] Correct example of creating a new SparkConf (cherry picked from commit 1014668f2727863fe46f9c75201ee459d093bf0c) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 08:24:51 UTC
364c14a [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark. Author: Sumedh Mungee <smungee@gmail.com> Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits: 26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class (cherry picked from commit 6e337380fc47071fc7fb28d744e8209c729fe1e9) Signed-off-by: Reynold Xin <rxin@apache.org> 21 May 2014, 08:23:04 UTC
67dd53d [maven-release-plugin] prepare for next development iteration 20 May 2014, 18:03:42 UTC
d807023 [maven-release-plugin] prepare release v1.0.0-rc10 20 May 2014, 18:03:34 UTC
b4d93d3 [Hotfix] Blacklisted flaky HiveCompatibility test `lateral_view_outer` query sometimes returns a different set of 10 rows. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #838 from tdas/hive-test-fix2 and squashes the following commits: 9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test. (cherry picked from commit 7f0cfe47f4709843d70ceccc25dee7551206ce0d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 17:27:28 UTC
0d98842 Revert "[maven-release-plugin] prepare release v1.0.0-rc9" This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75. 20 May 2014, 06:15:20 UTC
3f3e988 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d. 20 May 2014, 06:13:45 UTC
1c00f2a Updated CHANGES.txt 20 May 2014, 06:12:24 UTC
6cbe2a3 [Spark 1877] ClassNotFoundException when loading RDD with serialized objects Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects (cherry picked from commit 52eb54d02403a3c37d84b9da7cc1cdb261048cf8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 05:36:37 UTC
1c6c8b5 [SPARK-1874][MLLIB] Clean up MLlib sample data 1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example (cherry picked from commit bcb9dce6f444a977c714117811bce0c54b417650) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 04:29:50 UTC
78b6e6f SPARK-1689: Spark application should die when removed by Master scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such. Author: Aaron Davidson <aaron@databricks.com> Closes #832 from aarondav/i-love-u and squashes the following commits: 9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master (cherry picked from commit b0ce22e071da4cc62ec5e29abf7b1299b8e4a6b0) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 04:01:18 UTC
875c54f [SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x and hive Author: witgo <witgo@qq.com> Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits: ef7231d [witgo] review commit ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop 1 (cherry picked from commit 6a2c5c610c259f62cb12d8cfc18bf59cdb334bb2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 02:40:41 UTC
00563e1 SPARK-1879. Increase MaxPermSize since some of our builds have many classes See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes #823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes (cherry picked from commit 5af99d7617ba3b9fbfdb345ef9571b7dd41f45a1) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 20 May 2014, 01:42:47 UTC
901102c SPARK-1878: Fix the incorrect initialization order JIRA: https://issues.apache.org/jira/browse/SPARK-1878 Author: zsxwing <zsxwing@gmail.com> Closes #822 from zsxwing/SPARK-1878 and squashes the following commits: 4a47e27 [zsxwing] SPARK-1878: Fix the incorrect initialization order (cherry picked from commit 1811ba8ccb580979aa2e12019e6a82805f09ab53) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 19 May 2014, 23:41:53 UTC
111c121 [SPARK-1876] Windows fixes to deal with latest distribution layout changes - Look for JARs in the right place - Launch examples the same way as on Unix - Load datanucleus JARs if they exist - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was) Author: Matei Zaharia <matei@databricks.com> Closes #819 from mateiz/win-fixes and squashes the following commits: d558f96 [Matei Zaharia] Fix comment 228577b [Matei Zaharia] Review comments d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly 144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout (cherry picked from commit 7b70a7071894dd90ea1d0091542b3e13e7ef8d3a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 19 May 2014, 22:02:52 UTC
ecab8a2 [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0 Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <meng@databricks.com> Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change (cherry picked from commit df0aa8353ab6d3b19d838c6fa95a93a64948309f) Signed-off-by: Matei Zaharia <matei@databricks.com> 19 May 2014, 00:01:06 UTC
8e8b351 SPARK-1873: Add README.md file when making distributions Author: Patrick Wendell <pwendell@gmail.com> Closes #818 from pwendell/reamde and squashes the following commits: 4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions (cherry picked from commit 4ce479324bdcf603806fc90b5b0f4968c6de690e) Signed-off-by: Matei Zaharia <matei@databricks.com> 18 May 2014, 23:52:06 UTC
e06e4b0 Fix spark-submit path in spark-shell & pyspark Author: Neville Li <neville@spotify.com> Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits: 0dc33ed [Neville Li] Fix spark-submit path in pyspark becec64 [Neville Li] Fix spark-submit path in spark-shell 18 May 2014, 20:31:23 UTC
f8e6119 [maven-release-plugin] prepare for next development iteration 17 May 2014, 06:37:58 UTC
920f947 [maven-release-plugin] prepare release v1.0.0-rc9 17 May 2014, 06:37:50 UTC
8088911 Revert "[maven-release-plugin] prepare release v1.0.0-rc8" This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc. 17 May 2014, 06:10:53 UTC
e98bc19 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3. 17 May 2014, 06:10:46 UTC
3b3d7c8 Make deprecation warning less severe Just a small change. I think it's good not to scare people who are using the old options. Author: Patrick Wendell <pwendell@gmail.com> Closes #810 from pwendell/warnings and squashes the following commits: cb8a311 [Patrick Wendell] Make deprecation warning less severe (cherry picked from commit 442808a7482b81c8de887c901b424683da62022e) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 17 May 2014, 05:59:13 UTC
03b4242 [SPARK-1824] Remove <master> from Python examples A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too. Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example, ``` bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512] ``` Author: Andrew Or <andrewor14@gmail.com> Closes #802 from andrewor14/python-examples and squashes the following commits: cf50b9f [Andrew Or] De-indent python comments (minor) 50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction c362f69 [Andrew Or] Update docs to use spark-submit for python applications 7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples 427a5f0 [Andrew Or] Update docs d32072c [Andrew Or] Remove <master> from examples + update usages (cherry picked from commit cf6cbe9f76c3b322a968c836d039fc5b70d4ce43) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 17 May 2014, 05:36:51 UTC
318739a [SPARK-1808] Route bin/pyspark through Spark submit **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit (cherry picked from commit 4b8ec6fcfd7a7ef0857d5b21917183c181301c95) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 17 May 2014, 05:35:02 UTC
9cd12f3 Version bump of spark-ec2 scripts This will allow us to change things in spark-ec2 related to the 1.0 release. Author: Patrick Wendell <pwendell@gmail.com> Closes #809 from pwendell/spark-ec2 and squashes the following commits: 59117fb [Patrick Wendell] Version bump of spark-ec2 scripts (cherry picked from commit c0ab85d7320cea90e6331fb03a70349bc804c1b1) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 17 May 2014, 04:43:39 UTC
a16a19f SPARK-1864 Look in spark conf instead of system properties when propagating configuration to executors. Author: Michael Armbrust <michael@databricks.com> Closes #808 from marmbrus/confClasspath and squashes the following commits: 4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors. (cherry picked from commit a80a6a139e729ee3f81ec4f0028e084d2d9f7e82) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 17 May 2014, 03:25:24 UTC
2ba6711 Tweaks to Mesos docs - Mention Apache downloads first - Shorten some wording Author: Matei Zaharia <matei@databricks.com> Closes #806 from mateiz/doc-update and squashes the following commits: d9345cd [Matei Zaharia] typo a179f8d [Matei Zaharia] Tweaks to Mesos docs (cherry picked from commit fed6303f29250bd5e656dbdd731b38938c933a61) Signed-off-by: Matei Zaharia <matei@databricks.com> 17 May 2014, 00:35:17 UTC
386b31c [SQL] Implement between in hql Author: Michael Armbrust <michael@databricks.com> Closes #804 from marmbrus/between and squashes the following commits: ae24672 [Michael Armbrust] add golden answer. d9997ef [Michael Armbrust] Implement between in hql. 9bd4433 [Michael Armbrust] Better error on parse failures. (cherry picked from commit 032d6632ad4ab88c97c9e568b63169a114220a02) Signed-off-by: Reynold Xin <rxin@apache.org> 16 May 2014, 18:47:07 UTC
ff47cdc bugfix: overflow of graphx Edge compare function Author: Zhen Peng <zhenpeng01@baidu.com> Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits: 8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare 413c258 [Zhen Peng] there maybe a overflow for two Long's substraction (cherry picked from commit fa6de408a131a3e84350a60af74a92c323dfc5eb) Signed-off-by: Reynold Xin <rxin@apache.org> 16 May 2014, 18:38:55 UTC
e5436b8 [maven-release-plugin] prepare for next development iteration 16 May 2014, 08:19:00 UTC
80eea0f [maven-release-plugin] prepare release v1.0.0-rc8 16 May 2014, 08:18:53 UTC
610615b Revert "[maven-release-plugin] prepare release v1.0.0-rc7" This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464. 16 May 2014, 07:09:48 UTC
a16f46f Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6. 16 May 2014, 07:09:43 UTC
eec4dd8 SPARK-1862: Support for MapR in the Maven build. Author: Patrick Wendell <pwendell@gmail.com> Closes #803 from pwendell/mapr-support and squashes the following commits: 8df60e4 [Patrick Wendell] SPARK-1862: Support for MapR in the Maven build. (cherry picked from commit 17702e280c4b0b030870962fcb3d50c3085ae862) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 16 May 2014, 06:32:24 UTC
eac4ee8 [Spark-1461] Deferred Expression Evaluation (short-circuit evaluation) This patch unify the foldable & nullable interface for Expression. 1) Deterministic-less UDF (like Rand()) can not be folded. 2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false) I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method. Author: Cheng Hao <hao.cheng@intel.com> Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits: d2729de [Cheng Hao] Fix the codestyle issues a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation (cherry picked from commit a20fea98811d98958567780815fcf0d4fb4e28d4) Signed-off-by: Reynold Xin <rxin@apache.org> 16 May 2014, 05:12:49 UTC
5441471 SPARK-1860: Do not cleanup application work/ directories by default This causes an unrecoverable error for applications that are running for longer than 7 days that have jars added to the SparkContext, as the jars are cleaned up even though the application is still running. Author: Aaron Davidson <aaron@databricks.com> Closes #800 from aarondav/shitty-defaults and squashes the following commits: a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/ directories by default (cherry picked from commit bb98ecafce196ecc5bc3a1e4cc9264df7b752c6a) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 16 May 2014, 04:38:25 UTC
a2742d8 Typos in Spark Author: Huajian Mao <huajianmao@gmail.com> Closes #798 from huajianmao/patch-1 and squashes the following commits: 208a454 [Huajian Mao] A typo in Task 1b515af [Huajian Mao] A typo in the message (cherry picked from commit 94c5139607ec876782e594012a108ebf55fa97db) Signed-off-by: Reynold Xin <rxin@apache.org> 16 May 2014, 01:20:24 UTC
2e418f5 Fixes a misplaced comment. Fixes a misplaced comment from #785. @pwendell Author: Prashant Sharma <prashant.s@imaginea.com> Closes #788 from ScrapCodes/patch-1 and squashes the following commits: 3ef6a69 [Prashant Sharma] Update package-info.java 67d9461 [Prashant Sharma] Update package-info.java (cherry picked from commit e1e3416c4e5f6f32983597d74866dbb809cf6a5e) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 23:59:45 UTC
ffa9c49 [SQL] Fix tiny/small ints from HiveMetastore. Author: Michael Armbrust <michael@databricks.com> Closes #797 from marmbrus/smallInt and squashes the following commits: 2db9dae [Michael Armbrust] Fix tiny/small ints from HiveMetastore. (cherry picked from commit a4aafe5f9fb191533400caeafddf04986492c95f) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 23:50:49 UTC
22f261a SPARK-1803 Replaced colon in filenames with a dash This patch replaces colon in several filenames with dash to make these filenames Windows compatible. Author: Stevo Slavić <sslavic@gmail.com> Author: Stevo Slavic <sslavic@gmail.com> Closes #739 from sslavic/SPARK-1803 and squashes the following commits: 3ec66eb [Stevo Slavic] Removed extra empty line which was causing test to fail b967cc3 [Stevo Slavić] Aligned tests and names of test resources 2b12776 [Stevo Slavić] Fixed a typo in file name 1c5dfff [Stevo Slavić] Replaced colon in file name with dash 8f5bf7f [Stevo Slavić] Replaced colon in file name with dash c5b5083 [Stevo Slavić] Replaced colon in file name with dash a49801f [Stevo Slavić] Replaced colon in file name with dash 401d99e [Stevo Slavić] Replaced colon in file name with dash 40a9621 [Stevo Slavić] Replaced colon in file name with dash 4774580 [Stevo Slavić] Replaced colon in file name with dash 004f8bb [Stevo Slavić] Replaced colon in file name with dash d6a3e2c [Stevo Slavić] Replaced colon in file name with dash b585126 [Stevo Slavić] Replaced colon in file name with dash 028e48a [Stevo Slavić] Replaced colon in file name with dash ece0507 [Stevo Slavić] Replaced colon in file name with dash 84f5d2f [Stevo Slavić] Replaced colon in file name with dash 2fc7854 [Stevo Slavić] Replaced colon in file name with dash 9e1467d [Stevo Slavić] Replaced colon in file name with dash (cherry picked from commit e66e31be51f396c8f6b7a45119b8b31c4d8cdf79) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 23:45:06 UTC
3587057 SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark can read Avro file... ...s Author: Sandy Ryza <sandy@cloudera.com> Closes #795 from sryza/sandy-spark-1851 and squashes the following commits: 79c8227 [Sandy Ryza] SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark can read Avro files (cherry picked from commit 08e7606a964e3d1ac1d565f33651ff0035c75044) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 23:35:52 UTC
bc9a96e [SPARK-1741][MLLIB] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after https://github.com/apache/spark/pull/653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes #670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans (cherry picked from commit d52761d67f42ad4d2ff02d96f0675fb3ab709f38) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 19:00:26 UTC
f9eeddc [SPARK-1819] [SQL] Fix GetField.nullable. `GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #757 from ueshin/issues/SPARK-1819 and squashes the following commits: 8781a11 [Takuya UESHIN] Modify a test to use named parameters. 5bfc77d [Takuya UESHIN] Fix GetField.nullable. (cherry picked from commit 94c9d6f59859ebc77fae112c2c42c64b7a4d7f83) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 18:21:39 UTC
7515367 [SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of ... ...Scala collections. When I execute `orderBy` or `limit` for `SchemaRDD` including `ArrayType` or `MapType`, `SparkSqlSerializer` throws the following exception: ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.$colon$colon ``` or ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.Vector ``` or ``` com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.HashMap$HashTrieMap ``` and so on. This is because registrations of serializers for each concrete collections are missing in `SparkSqlSerializer`. I believe it should use `AllScalaRegistrar`. `AllScalaRegistrar` covers a lot of serializers for concrete classes of `Seq`, `Map` for `ArrayType`, `MapType`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #790 from ueshin/issues/SPARK-1845 and squashes the following commits: d1ed992 [Takuya UESHIN] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of Scala collections. (cherry picked from commit db8cc6f28abe4326cea6f53feb604920e4867a27) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 18:20:40 UTC
aa5f989 SPARK-1846 Ignore logs directory in RAT checks https://issues.apache.org/jira/browse/SPARK-1846 Author: Andrew Ash <andrew@andrewash.com> Closes #793 from ash211/SPARK-1846 and squashes the following commits: 3f50db5 [Andrew Ash] SPARK-1846 Ignore logs directory in RAT checks (cherry picked from commit 3abe2b734a5578966f671c34f1de34b4446b90f1) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 18:05:56 UTC
88f1da3 HOTFIX: Don't build Javadoc in Maven when creating releases. Because we've added java package descriptions in some packages that don't have any Java files, running the Javadoc target hits this issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4492654 To fix this I've simply removed the javadoc target when publishing releases. 15 May 2014, 07:48:58 UTC
c4746aa [maven-release-plugin] prepare for next development iteration 15 May 2014, 06:53:21 UTC
9212b3e [maven-release-plugin] prepare release v1.0.0-rc7 15 May 2014, 06:53:14 UTC
aa2ac70 Revert "[maven-release-plugin] prepare release v1.0.0-rc6" This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82. 15 May 2014, 06:27:41 UTC
a28e373 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb. 15 May 2014, 06:27:36 UTC
31b853c fix different versions of commons-lang dependency and apache/spark#746 addendum Author: witgo <witgo@qq.com> Closes #754 from witgo/commons-lang and squashes the following commits: 3ebab31 [witgo] merge master f3b8fa2 [witgo] merge master 2083fae [witgo] repeat definition 5599cdb [witgo] multiple version of sbt dependency c1b66a1 [witgo] fix different versions of commons-lang dependency (cherry picked from commit bae07e36a6e0fb7982405316646b452b4ff06acc) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 05:26:34 UTC
c02d614 Package docs This is a few changes based on the original patch by @scrapcodes. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #785 from pwendell/package-docs and squashes the following commits: c32b731 [Patrick Wendell] Changes based on Prashant's patch c0463d3 [Prashant Sharma] added eof new line ce8bf73 [Prashant Sharma] Added eof new line to all files. 4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs (cherry picked from commit 46324279dae2fa803267d788f7c56b0ed643b4c8) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 05:24:50 UTC
f2faa37 Documentation: Encourage use of reduceByKey instead of groupByKey. Author: Patrick Wendell <pwendell@gmail.com> Closes #784 from pwendell/group-by-key and squashes the following commits: 9b4505f [Patrick Wendell] Small fix 6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey. (cherry picked from commit 21570b463388194877003318317aafd842800cac) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 05:24:20 UTC
976784b Add language tabs and Python version to interactive part of quick-start This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit. Author: Matei Zaharia <matei@databricks.com> Closes #782 from mateiz/spark-1567-extra and squashes the following commits: 6f8f2aa [Matei Zaharia] tweaks 9ed9874 [Matei Zaharia] tweaks ae67c3e [Matei Zaharia] tweak b303ba3 [Matei Zaharia] tweak 1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide (cherry picked from commit f10de042b8e86adf51b70bae2d8589a5cbf02935) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 04:45:37 UTC
ba87123 [SPARK-1840] SparkListenerBus prints out scary error message when terminated normally Running SparkPi example gave this error. ``` Pi is roughly 3.14374 14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus scala.runtime.NonLocalReturnControl$mcV$sp ``` This is due to the catch-all in the SparkListenerBus, which logged control throwable used by scala system Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #783 from tdas/controlexception-fix and squashes the following commits: a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions. (cherry picked from commit ad4e60ee7e2c49c24a9972312915f7f7253c7679) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 15 May 2014, 04:14:21 UTC
9f0f2ec default task number misleading in several places private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to https://github.com/apache/spark/pull/389 Author: Chen Chao <crazyjvm@gmail.com> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places (cherry picked from commit 2f639957f0bf70dddf1e698aa9e26007fb58bc67) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 01:20:33 UTC
fdf9717 [SPARK-1826] fix the head notation of package object dsl Author: wangfei <scnbwf@yeah.net> Closes #765 from scwf/dslfix and squashes the following commits: d2d1a9d [wangfei] Update package.scala 66ff53b [wangfei] fix the head notation of package object dsl (cherry picked from commit 44165fc91a31e6293a79031c89571e139d2c5356) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 00:59:22 UTC
5ca3096 [Typo] propertes -> properties Author: andrewor14 <andrewor14@gmail.com> Closes #780 from andrewor14/submit-typo and squashes the following commits: e70e057 [andrewor14] propertes -> properties (cherry picked from commit 9ad096d55a3d8410f04056ebc87dbd8cba391870) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 00:55:00 UTC
d6f1a75 [SPARK-1696][MLLIB] use alpha in dense dspr It doesn't affect existing code because only `alpha = 1.0` is used in the code. Author: Xiangrui Meng <meng@databricks.com> Closes #778 from mengxr/mllib-dspr-fix and squashes the following commits: a37402e [Xiangrui Meng] use alpha in dense dspr (cherry picked from commit e3d72a74ad007c2bf279d6a74cdaca948bdf0ddd) Signed-off-by: Reynold Xin <rxin@apache.org> 15 May 2014, 00:18:38 UTC
31faec7 [FIX] do not load defaults when testing SparkConf in pyspark The default constructor loads default properties, which can fail the test. Author: Xiangrui Meng <meng@databricks.com> Closes #775 from mengxr/pyspark-conf-fix and squashes the following commits: 83ef6c4 [Xiangrui Meng] do not load defaults when testing SparkConf in pyspark (cherry picked from commit 94c6c06ea13032b80610b3f54401d2ef2aa4874a) Signed-off-by: Reynold Xin <rxin@apache.org> 14 May 2014, 21:57:24 UTC
8e13ab2 SPARK-1833 - Have an empty SparkContext constructor. This is nicer than relying on new SparkContext(new SparkConf()) Author: Patrick Wendell <pwendell@gmail.com> Closes #774 from pwendell/spark-context and squashes the following commits: ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor. (cherry picked from commit 65533c7ec03e7eedf5cd9756822863ab6f034ec9) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 14 May 2014, 19:53:42 UTC
530bdf7 SPARK-1829 Sub-second durations shouldn't round to "0 s" As "99 ms" up to 99 ms As "0.1 s" from 0.1 s up to 0.9 s https://issues.apache.org/jira/browse/SPARK-1829 Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0 Author: Andrew Ash <andrew@andrewash.com> Closes #768 from ash211/spark-1829 and squashes the following commits: 1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately (cherry picked from commit a3315d7f4c7584dae2ee0aa33c6ec9e97b229b48) Signed-off-by: Reynold Xin <rxin@apache.org> 14 May 2014, 19:01:22 UTC
379f733 Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space Author: witgo <witgo@qq.com> Closes #773 from witgo/sbt_javaOptions and squashes the following commits: 26c7d38 [witgo] Improve sbt configuration (cherry picked from commit fde82c1549c78f1eebbb21ec34e60befbbff65f5) Signed-off-by: Reynold Xin <rxin@apache.org> 14 May 2014, 18:19:43 UTC
e480bcf [maven-release-plugin] prepare for next development iteration 14 May 2014, 17:50:40 UTC
54133ab [maven-release-plugin] prepare release v1.0.0-rc6 14 May 2014, 17:50:33 UTC
back to top