https://github.com/apache/spark

sort by:
Revision Author Date Message Commit Date
3693ae5 [maven-release-plugin] prepare release v1.1.1-rc2 19 November 2014, 20:10:56 UTC
aa3c794 Update CHANGES.txt for 1.1.1-rc2 19 November 2014, 19:35:43 UTC
16bf5f3 [SPARK-4480] Avoid many small spills in external data structures (1.1) This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills 19 November 2014, 18:45:42 UTC
e22a759 [SPARK-4380] Log more precise number of bytes spilled (1.1) This is the branch-1.1 version of #3243. Author: Andrew Or <andrew@databricks.com> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code 19 November 2014, 04:15:00 UTC
f9739b9 [SPARK-4468][SQL] Backports #3334 to branch-1.1 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3338 from liancheng/spark-3334-for-1.1 and squashes the following commits: bd17512 [Cheng Lian] Backports #3334 to branch-1.1 19 November 2014, 01:40:24 UTC
ae9b1f6 [SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046154a438df4db30a0e1fd557bd3399ee7b) Signed-off-by: Xiangrui Meng <meng@databricks.com> 19 November 2014, 00:26:16 UTC
91b5fa8 [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager 18 November 2014, 20:09:18 UTC
aa9ebda [SPARK-4467] Partial fix for fetch failure in sort-based shuffle (1.1) This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <andrew@databricks.com> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead` 18 November 2014, 02:10:49 UTC
e4f5695 Revert "[maven-release-plugin] prepare release v1.1.1-rc1" This reverts commit 72a4fdbe82203b962fe776d0edaed7f56898cb02. 17 November 2014, 19:49:48 UTC
cf8d0ef Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 685bdd2b7e584c84e7d39e40de2d5f30c5388cb5. 17 November 2014, 19:49:33 UTC
b528367 Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file" This reverts commit 098f83c7ccd7dad9f9228596da69fe5f55711a52. 17 November 2014, 19:25:38 UTC
4b1c77c [branch-1.1][SPARK-4355] OnlineSummarizer doesn't merge mean correctly andrewor14 This backports the bug fix in #3220 . It would be good if we can get it in 1.1.1. But this is minor. Author: Xiangrui Meng <meng@databricks.com> Closes #3251 from mengxr/SPARK-4355-1.1 and squashes the following commits: 33886b6 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-1.1' into SPARK-4355-1.1 91fe1a3 [Xiangrui Meng] fix OnlineSummarizer.merge when other.mean is zero 13 November 2014, 23:36:03 UTC
685bdd2 [maven-release-plugin] prepare for next development iteration 13 November 2014, 01:54:40 UTC
72a4fdb [maven-release-plugin] prepare release v1.1.1-rc1 13 November 2014, 01:54:34 UTC
6f7b1bc Revert "[maven-release-plugin] prepare release v1.1.1-rc1" This reverts commit 3f9e073ff0bb18b6079fda419d4e9dbf594545b0. 13 November 2014, 01:14:04 UTC
6f34fa0 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 6de888129fcfe6e592458a4217fc66140747b54f. 13 November 2014, 01:13:55 UTC
ba6d81d [Release] Correct make-distribution.sh log path 13 November 2014, 00:51:34 UTC
88bc482 [Release] Bring audit scripts up-to-date This involves a few main changes: - Log all output message to the log file. Previously the log file was not useful because it did not indicate progress. - Remove hive-site.xml in sbt_hive_app to avoid interference - Add the appropriate repositories for new dependencies 13 November 2014, 00:30:58 UTC
6de8881 [maven-release-plugin] prepare for next development iteration 12 November 2014, 20:20:29 UTC
3f9e073 [maven-release-plugin] prepare release v1.1.1-rc1 12 November 2014, 20:20:23 UTC
8fe1c8c Revert "[maven-release-plugin] prepare release v1.1.1-rc1" This reverts commit 7029301778895427216f2e0710c6e72a523c0897. 12 November 2014, 19:39:27 UTC
d3b808f Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit db22a9e2cb51eae2f8a79648ce3c6bf4fecdd641. 12 November 2014, 19:39:19 UTC
db22a9e [maven-release-plugin] prepare for next development iteration 12 November 2014, 19:01:46 UTC
7029301 [maven-release-plugin] prepare release v1.1.1-rc1 12 November 2014, 19:01:36 UTC
e3a5ee9 [Release] Log build output for each distribution 12 November 2014, 18:24:07 UTC
86c285c Revert "[maven-release-plugin] prepare release v1.1.1-rc1" This reverts commit 837deabebf0714e3f3aca135d77169cc825824f3. 12 November 2014, 18:20:12 UTC
837deab [maven-release-plugin] prepare release v1.1.1-rc1 12 November 2014, 08:43:11 UTC
4ac5679 Revert "[maven-release-plugin] prepare release v1.1.1-rc1" This reverts commit f3e62ffa4ccea62911207b918ef1c23c1f50467f. Conflicts: pom.xml 12 November 2014, 08:08:20 UTC
9d13735 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 5c0032a471d858fb010b1737ea14375f1af3ed88. 12 November 2014, 08:07:49 UTC
45a01b6 Revert "SPARK-3039: Allow spark to be built using avro-mapred for hadoop2" This reverts commit 78887f94a0ae9cdcfb851910ab9c7d51a1ef2acb. Conflicts: pom.xml 12 November 2014, 08:04:30 UTC
5c0032a [maven-release-plugin] prepare for next development iteration 11 November 2014, 23:52:40 UTC
f3e62ff [maven-release-plugin] prepare release v1.1.1-rc1 11 November 2014, 23:52:33 UTC
131c626 Update CHANGES.txt 11 November 2014, 23:11:55 UTC
bf867c3 [SPARK-4295][External]Fix exception in SparkSinkSuite Handle exception in SparkSinkSuite, please refer to [SPARK-4295] Author: maji2014 <maji3@asiainfo.com> Closes #3177 from maji2014/spark-4295 and squashes the following commits: 312620a [maji2014] change a new statement for spark-4295 24c3d21 [maji2014] add log4j.properties for SparkSinkSuite and spark-4295 c807bf6 [maji2014] Fix exception in SparkSinkSuite (cherry picked from commit f8811a5695af2dfe156f07431288db7b8cd97159) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 11 November 2014, 10:19:02 UTC
b2cb357 [branch-1.1][SPARK-3990] add a note on ALS usage Because we switched back to Kryo in #3187 , we need to leave a note about the workaround. Author: Xiangrui Meng <meng@databricks.com> Closes #3190 from mengxr/SPARK-3990-1.1 and squashes the following commits: d4818f3 [Xiangrui Meng] fix python style 53725b0 [Xiangrui Meng] add a note about SPARK-3990 56ad70e [Xiangrui Meng] add a note about SPARK-3990 11 November 2014, 06:39:09 UTC
11798d0 [BRANCH-1.1][SPARK-2652] change the default spark.serializer in pyspark back to Kryo This reverts #2916 . We shouldn't change the default settings in a minor release. JoshRosen davies Author: Xiangrui Meng <meng@databricks.com> Closes #3187 from mengxr/SPARK-2652-1.1 and squashes the following commits: 372166b [Xiangrui Meng] change the default spark.serializer in pyspark back to Kryo 11 November 2014, 06:21:14 UTC
d313be8 [SPARK-4330][Doc] Link to proper URL for YARN overview In running-on-yarn.md, a link to YARN overview is here. But the URL is to YARN alpha's. It should be stable's. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3196 from sarutak/SPARK-4330 and squashes the following commits: 30baa21 [Kousuke Saruta] Fixed running-on-yarn.md to point proper URL for YARN (cherry picked from commit 3c07b8f08240bafcdff5d174989fb433f4bc80b6) Signed-off-by: Matei Zaharia <matei@databricks.com> 11 November 2014, 06:18:22 UTC
8a1d818 [SQL] Backport backtick and smallint JDBC fixes to 1.1 Author: Michael Armbrust <michael@databricks.com> Author: ravipesala <ravindra.pesala@huawei.com> Author: scwf <wangfei1@huawei.com> Closes #3199 from marmbrus/backport1.1 and squashes the following commits: 019a0dd [Michael Armbrust] Drop incorrectly ported test cases 4c9f3e6 [ravipesala] [SPARK-3708][SQL] Backticks aren't handled correctly is aliases 064750d [scwf] [SPARK-3704][SQL] Fix ColumnValue type for Short values in thrift server f4e17cd [ravipesala] [SPARK-3834][SQL] Backticks not correctly handled in subquery aliases 11 November 2014, 03:51:28 UTC
01d233e Update versions for 1.1.1 release 11 November 2014, 02:40:34 UTC
be0cc99 [SPARK-3495][SPARK-3496] Backporting block replication fixes made in master to branch 1.1 The original PR was #2366 This backport was non-trivial because Spark 1.1 uses ConnectionManager instead of NioBlockTransferService, which required slight modification to unit tests. Other than that the code is exactly same as in the original PR. Please refer to discussion in the original PR if you have any thoughts. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3191 from tdas/replication-fix-branch-1.1-backport and squashes the following commits: 593214a [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-1.1' into branch-1.1 2ed927f [Tathagata Das] Fixed error in unit test. de4ff73 [Tathagata Das] [SPARK-3495] Block replication fails continuously when the replication target node is dead AND [SPARK-3496] Block replication by mistake chooses driver as target 11 November 2014, 02:23:02 UTC
3d889df [SPARK-3954][Streaming] Optimization to FileInputDStream about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-size.foreach It's will very time consuming when lots of files.So I do the following correction: 3 loops with files sequence => only one loop Author: surq <surq@asiainfo.com> Closes #2811 from surq/SPARK-3954 and squashes the following commits: 321bbe8 [surq] updated the code style.The style from [for...yield]to [files.map(file=>{})] 88a2c20 [surq] Merge branch 'master' of https://github.com/apache/spark into SPARK-3954 178066f [surq] modify code's style. [Exceeds 100 columns] 626ef97 [surq] remove redundant import(ArrayBuffer) 739341f [surq] promote the speed of convert files to RDDS (cherry picked from commit ce6ed2abd14de26b9ceaa415e9a42fbb1338f5fa) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 11 November 2014, 01:38:22 UTC
64945f8 [SPARK-3971][SQL] Backport #2843 to branch-1.1 This PR backports #2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried. 6 other commits on which #2843 depends were also backported, they are: - #2887 for `SessionState` lifecycle control - #2675, #2823 & #3060 for major test suite refactoring and bug fixes - #2164, for Parquet test suites updates - #2493, for reading `spark.sql.*` configurations Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <lian.cs.zju@gmail.com> Author: Michael Armbrust <michael@databricks.com> Closes #3113 from liancheng/get-info-for-1.1 and squashes the following commits: d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1 0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created. 3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing 7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite 0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure 5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2 11 November 2014, 01:04:40 UTC
b3ef06b [SPARK-4308][SQL] Follow up of #3175 for branch 1.1 PR #3175 is for master branch only and can't be backported to branch 1.1 directly because Hive 0.13.1 support. Author: Cheng Lian <lian@databricks.com> Closes #3176 from liancheng/fix-op-state-for-1.1 and squashes the following commits: 8791d87 [Cheng Lian] This is a follow up of #3175 for branch 1.1 11 November 2014, 00:57:34 UTC
86b1bd0 [SPARK-2548][HOTFIX][Streaming] Removed use of o.a.s.streaming.Durations in branch 1.1 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3188 from tdas/branch-1.1 and squashes the following commits: f1996d3 [Tathagata Das] [SPARK-2548][HOTFIX] Removed use of o.a.s.streaming.Durations 10 November 2014, 22:13:42 UTC
254b135 Update RecoverableNetworkWordCount.scala Trying this example, I missed the moment when the checkpoint was iniciated Author: comcmipi <pitonak@fns.uniba.sk> Closes #2735 from comcmipi/patch-1 and squashes the following commits: b6d8001 [comcmipi] Update RecoverableNetworkWordCount.scala 96fe274 [comcmipi] Update RecoverableNetworkWordCount.scala (cherry picked from commit 0340c56a921d4eb4bc9058e25e926721f8df594c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 10 November 2014, 20:34:20 UTC
cdcf546 SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting issues along the way I believe. Author: Sean Owen <sowen@cloudera.com> Closes #2564 from srowen/SPARK-2548 and squashes the following commits: 0d0bf29 [Sean Owen] Update checkpoint call as in https://github.com/apache/spark/pull/2735 35f23e3 [Sean Owen] Remove old comment about running in standalone mode 179b3c2 [Sean Owen] Re-port RecoverableNetworkWordCount to Java example, and touch up doc / formatting in related examples (cherry picked from commit 3a02d416cd82a7a942fd6ff4a0e05ff070eb218a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 10 November 2014, 19:48:18 UTC
dc38def [SPARK-4169] [Core] Accommodate non-English Locales in unit tests For me the core tests failed because there are two locale dependent parts in the code. Look at the Jira ticket for details. Why is it necessary to check the exception message in isBindCollision in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686 ? Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de> Closes #3036 from numbnut/core-test-fix and squashes the following commits: 1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests (cherry picked from commit ed8bf1eac548577c4bbad7ce3f7f301a2f52ef17) Signed-off-by: Andrew Or <andrew@databricks.com> 10 November 2014, 19:37:59 UTC
78cd3ab [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`. The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures. Prior discussions: - https://github.com/apache/spark/pull/3053#discussion-diff-19710333R490 - https://github.com/apache/spark/pull/3121#issuecomment-61927353 Author: Josh Rosen <joshrosen@databricks.com> Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits: dbcc929 [Josh Rosen] Address more review comments bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before. 03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called. 832a7f4 [Josh Rosen] Address review comment 5142517 [Josh Rosen] Add tests; improve Scaladoc. 813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49 5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. (cherry picked from commit 7b41b17f3296eea3282efbdceb6b28baf128287d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 09 November 2014, 02:12:02 UTC
4895f65 [SPARK-4304] [PySpark] Fix sort on empty RDD This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.1/1.2 Author: Davies Liu <davies@databricks.com> Closes #3162 from davies/fix_sort and squashes the following commits: 84f64b7 [Davies Liu] add tests 52995b5 [Davies Liu] fix sortByKey() on empty RDD (cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: python/pyspark/tests.py 08 November 2014, 04:55:12 UTC
4fb26df Update JavaCustomReceiver.java 数组下标越界 Author: xiao321 <1042460381@qq.com> Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> 07 November 2014, 20:57:38 UTC
0a40eac [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #3138 from lianhuiwang/SPARK-4249 and squashes the following commits: 3f4e503 [lianhuiwang] fix a problem of EdgePartitionBuilder in Graphx (cherry picked from commit d15c6e9dc2860bbe56e31ddf71218ccc6d5c841d) Signed-off-by: Ankur Dave <ankurdave@gmail.com> 06 November 2014, 18:48:32 UTC
c58c1bb [SPARK-4158] Fix for missing resources. Mesos offers may not contain all resources, and Spark needs to check to ensure they are present and sufficient. Spark may throw an erroneous exception when resources aren't present. Author: Brenden Matthews <brenden@diddyinc.com> Closes #3024 from brndnmtthws/fix-mesos-resource-misuse and squashes the following commits: e5f9580 [Brenden Matthews] [SPARK-4158] Fix for missing resources. (cherry picked from commit cb0eae3b78d7f6f56c0b9521ee48564a4967d3de) Signed-off-by: Andrew Or <andrew@databricks.com> 06 November 2014, 00:03:06 UTC
590a943 SPARK-3223 runAsSparkUser cannot change HDFS write permission properly i... ...n mesos cluster mode - change master newer Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3034 from jongyoul/SPARK-3223 and squashes the following commits: 42b2ed3 [Jongyoul Lee] SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode - change master newer (cherry picked from commit f7ac8c2b1de96151231617846b7468d23379c74a) Signed-off-by: Andrew Or <andrew@databricks.com> 05 November 2014, 23:50:13 UTC
44751af [branch-1.1][SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample Port #3010 to branch-1.1. Author: Xiangrui Meng <meng@databricks.com> Closes #3104 from mengxr/SPARK-4148-1.1 and squashes the following commits: 684c002 [Xiangrui Meng] apply SPARK-4148 to branch-1.1 05 November 2014, 18:30:10 UTC
1b282cd [SPARK-4115][GraphX] Add overrided count for edge counting of EdgeRDD. Accumulate sizes of all the EdgePartitions just like the VertexRDD. Author: luluorta <luluorta@gmail.com> Closes #2975 from luluorta/graph-edge-count and squashes the following commits: 86ef0e5 [luluorta] Add overrided count for edge counting of EdgeRDD. (cherry picked from commit ee29ef3800438501e0ff207feb00a28973fc0769) Signed-off-by: Reynold Xin <rxin@databricks.com> 01 November 2014, 08:23:00 UTC
abdb90b [SPARK-4097] Fix the race condition of 'thread' There is a chance that `thread` is null when calling `thread.interrupt()`. ```Scala override def cancel(): Unit = this.synchronized { _cancelled = true if (thread != null) { thread.interrupt() } } ``` Should put `thread = null` into a `synchronized` block to fix the race condition. Author: zsxwing <zsxwing@gmail.com> Closes #2957 from zsxwing/SPARK-4097 and squashes the following commits: edf0aee [zsxwing] Add comments to explain the lock c5cfeca [zsxwing] Fix the race condition of 'thread' (cherry picked from commit e7fd80413d531e23b6c4def0ee32e52a39da36fa) Signed-off-by: Reynold Xin <rxin@databricks.com> 29 October 2014, 21:43:48 UTC
f0c5717 [SPARK-4065] Add check for IPython on Windows This issue employs logic similar to the bash launcher (pyspark) to check if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS. This fix assumes that ipython is available in the system Path, and can be invoked with a plain "ipython" command. Author: Michael Griffiths <msjgriffiths@gmail.com> Closes #2910 from msjgriffiths/pyspark-windows and squashes the following commits: ef34678 [Michael Griffiths] Change build message to comply with [SPARK-3775] 361e3d8 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows 9ce72d1 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows (cherry picked from commit 2f254dacf4b7ab9c59c7cef59fd364ca682162ae) Signed-off-by: Andrew Or <andrew@databricks.com> 28 October 2014, 19:47:33 UTC
286f1ef [SPARK-4107] Fix incorrect handling of read() and skip() return values (branch-1.1 backport) `read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. This is a backport of #2969 to `branch-1.1`. Author: Josh Rosen <joshrosen@databricks.com> Closes #2974 from JoshRosen/spark-4107-branch-1.1-backport and squashes the following commits: d82c05b [Josh Rosen] [SPARK-4107] Fix incorrect handling of read() and skip() return values 28 October 2014, 19:30:12 UTC
dee3317 [SPARK-4110] Wrong comments about default settings in spark-daemon.sh In spark-daemon.sh, thare are following comments. # SPARK_CONF_DIR Alternate conf dir. Default is ${SPARK_PREFIX}/conf. # SPARK_LOG_DIR Where log files are stored. PWD by default. But, I think the default value for SPARK_CONF_DIR is `${SPARK_HOME}/conf` and for SPARK_LOG_DIR is `${SPARK_HOME}/logs`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2972 from sarutak/SPARK-4110 and squashes the following commits: 5a171a2 [Kousuke Saruta] Fixed wrong comments (cherry picked from commit 44d8b45a38c8d934628373a3b21084432516ee00) Signed-off-by: Andrew Or <andrew@databricks.com> 28 October 2014, 19:29:12 UTC
2ef2f5a [MLlib] SPARK-3987: add test case on objective value for NNLS Also update step parameter to pass the proposed test Author: coderxiang <shuoxiangpub@gmail.com> Closes #2965 from coderxiang/nnls-test and squashes the following commits: 24b06f9 [coderxiang] add test case on objective value for NNLS; update step parameter to pass the test (cherry picked from commit 7e3a1ada86e6adf1ddd4d8a321824daf5f3b2c75) Signed-off-by: Xiangrui Meng <meng@databricks.com> 28 October 2014, 02:44:02 UTC
2eb9d7c Fix build breakage introduced by 6c10c2770c718287f9cc2af4109b701fa1057b70 26 October 2014, 03:33:17 UTC
c1989aa Revert "[SPARK-4056] Upgrade snappy-java to 1.1.1.5" This reverts commit b7541ae89c3db71979f11f2f0b2cb737cb5d1fb3. 26 October 2014, 00:09:01 UTC
b7541ae [SPARK-4056] Upgrade snappy-java to 1.1.1.5 This upgrades snappy-java to 1.1.1.5, which improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see https://github.com/xerial/snappy-java/issues/89). Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #2911 from JoshRosen/upgrade-snappy-java and squashes the following commits: adec96c [Josh Rosen] Use snappy-java 1.1.1.5 cc953d6 [Josh Rosen] [SPARK-4056] Upgrade snappy-java to 1.1.1.4 (cherry picked from commit 898b22ab1fe90e8a3935b19566465046f2256fa6) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: pom.xml 25 October 2014, 00:22:40 UTC
6c10c27 [SPARK-4080] Only throw IOException from [write|read][Object|External] If classes implementing Serializable or Externalizable interfaces throw exceptions other than IOException or ClassNotFoundException from their (de)serialization methods, then this results in an unhelpful "IOException: unexpected exception type" rather than the actual exception that produced the (de)serialization error. This patch fixes this by adding a utility method that re-wraps any uncaught exceptions in IOException (unless they are already instances of IOException). Author: Josh Rosen <joshrosen@databricks.com> Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits: cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External]. (cherry picked from commit 6c98c29ae0033556fd4424f41d1de005c509e511) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala 24 October 2014, 22:21:08 UTC
59297e9 [SPARK-4006] In long running contexts, we encountered the situation of d... ...ouble registe... ...r without a remove in between. The cause for that is unknown, and assumed a temp network issue. However, since the second register is with a BlockManagerId on a different port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor returns Some. This inconsistency is caught in a conditional statement that does System.exit(1), which is a huge robustness issue for us. The fix - simply remove the old id from both maps during register when this happens. We are mimicking the behavior of expireDeadHosts(), by doing local cleanup of the maps before trying to add new ones. Also - added some logging for register and unregister. This is just like https://github.com/apache/spark/pull/2886 except it's on branch-1.1 Author: Tal Sliwowicz <tal.s@taboola.com> Closes #2915 from tsliwowicz/branch-1.1-block-mgr-removal and squashes the following commits: d122236 [Tal Sliwowicz] [SPARK-4006] In long running contexts, we encountered the situation of double registe... 24 October 2014, 20:51:25 UTC
80dde80 [SPARK-4075] [Deploy] Jar url validation is not enough for Jar file In deploy.ClientArguments.isValidJarUrl, the url is checked as follows. def isValidJarUrl(s: String): Boolean = s.matches("(.+):(.+)jar") So, it allows like 'hdfs:file.jar' (no authority). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2925 from sarutak/uri-syntax-check-improvement and squashes the following commits: cf06173 [Kousuke Saruta] Improved URI syntax checking (cherry picked from commit 098f83c7ccd7dad9f9228596da69fe5f55711a52) Signed-off-by: Andrew Or <andrew@databricks.com> 24 October 2014, 20:09:08 UTC
386fc46 [SPARK-4076] Parameter expansion in spark-config is wrong In sbin/spark-config.sh, parameter expansion is used to extract source root as follows. this="${BASH_SOURCE-$0}" I think, the parameter expansion should be ":" instead of "". If we use "-" and BASH_SOURCE="", (empty character is set, not unset), "" (empty character) is set to $this. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2930 from sarutak/SPARK-4076 and squashes the following commits: 32a0370 [Kousuke Saruta] Fixed wrong parameter expansion (cherry picked from commit 30ea2868e7afbec20bfc83818249b6d2d7dc6aec) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: sbin/spark-config.sh 24 October 2014, 20:05:47 UTC
926f8ca [SPARK-2652] [PySpark] donot use KyroSerializer as default serializer KyroSerializer can not serialize customized class without registered explicitly, use it as default serializer in PySpark will introduce some regression in MLlib. cc mengxr Author: Davies Liu <davies@databricks.com> Closes #2916 from davies/revert and squashes the following commits: 43eb6d3 [Davies Liu] donot use KyroSerializer as default serializer (cherry picked from commit 809c785bcc33e684a68ea14240a466def864199a) Signed-off-by: Xiangrui Meng <meng@databricks.com> 24 October 2014, 06:58:15 UTC
5e191fa [SPARK-3426] Fix sort-based shuffle error when spark.shuffle.compress and spark.shuffle.spill.compress settings are different This PR fixes SPARK-3426, an issue where sort-based shuffle crashes if the `spark.shuffle.spill.compress` and `spark.shuffle.compress` settings have different values. The problem is that sort-based shuffle's read and write paths use different settings for determining whether to apply compression. ExternalSorter writes runs to files using `TempBlockId` ids, which causes `spark.shuffle.spill.compress` to be used for enabling compression, but these spilled files end up being shuffled over the network and read as shuffle files using `ShuffleBlockId` by BlockStoreShuffleFetcher, which causes `spark.shuffle.compress` to be used for enabling decompression. As a result, this leads to errors when these settings disagree. Based on the discussions in #2247 and #2178, it sounds like we don't want to remove the `spark.shuffle.spill.compress` setting. Therefore, I've tried to come up with a fix where `spark.shuffle.spill.compress` is used to compress data that's read and written locally and `spark.shuffle.compress` is used to compress any data that will be fetched / read as shuffle blocks. To do this, I split `TempBlockId` into two new id types, `TempLocalBlockId` and `TempShuffleBlockId`, which map to `spark.shuffle.spill.compress` and `spark.shuffle.compress`, respectively. ExternalAppendOnlyMap also used temp blocks for spilling data. It looks like ExternalSorter was designed to be a generic sorter but its configuration already happens to be tied to sort-based shuffle, so I think it's fine if we use `spark.shuffle.compress` to compress its spills; we can move the compression configuration to the constructor in a later commit if we find that ExternalSorter is being used in other contexts where we want different configuration options to control compression. To summarize: **Before:** | | ExternalAppendOnlyMap | ExternalSorter | |-------|------------------------------|------------------------------| | Read | spark.shuffle.spill.compress | spark.shuffle.compress | | Write | spark.shuffle.spill.compress | spark.shuffle.spill.compress | **After:** | | ExternalAppendOnlyMap | ExternalSorter | |-------|------------------------------|------------------------| | Read | spark.shuffle.spill.compress | spark.shuffle.compress | | Write | spark.shuffle.spill.compress | spark.shuffle.compress | Thanks to andrewor14 for debugging this with me! Author: Josh Rosen <joshrosen@databricks.com> Closes #2890 from JoshRosen/SPARK-3426 and squashes the following commits: 1921cf6 [Josh Rosen] Minor edit for clarity. c8dd8f2 [Josh Rosen] Add comment explaining use of createTempShuffleBlock(). 2c687b9 [Josh Rosen] Fix SPARK-3426. 91e7e40 [Josh Rosen] Combine tests into single test of all combinations 76ca65e [Josh Rosen] Add regression test for SPARK-3426. Conflicts: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala 22 October 2014, 22:10:14 UTC
eb62094 [SPARK-3877][YARN] Throw an exception when application is not successful so that the exit code wil be set to 1 (for branch-1.1) This is a patch to fix SPARK-3877 in branch-1.1. See also #2732 Author: zsxwing <zsxwing@gmail.com> Closes #2748 from zsxwing/SPARK-3877-branch-1.1 and squashes the following commits: 3701984 [zsxwing] Remove System.exit from Client.scala 8681881 [zsxwing] [SPARK-3877] Throw an exception when application is not successful so that the exit code wil be set to 1 22 October 2014, 22:08:28 UTC
457ef59 [SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode The problem caused by #1966 CC YanTangZhai andrewor14 Author: GuoQiang Li <witgo@qq.com> Closes #2858 from witgo/SPARK-4010 and squashes the following commits: 9866fbf [GuoQiang Li] Spark UI returns 500 in yarn-client mode (cherry picked from commit 51afde9d8b8a67958c4632a13af143d7c7fd1f04) Signed-off-by: Andrew Or <andrewor14@gmail.com> 20 October 2014, 18:04:07 UTC
12a61d8 [SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shuffle Kernel 2.6.32 bug will lead to unexpected behavior of transferTo in copyStream, and this will corrupt the shuffle output file in sort-based shuffle, which will somehow introduce PARSING_ERROR(2), deserialization error or offset out of range. Here fix this by adding append flag, also add some position checking code. Details can be seen in [SPARK-3948](https://issues.apache.org/jira/browse/SPARK-3948). Author: jerryshao <saisai.shao@intel.com> Closes #2824 from jerryshao/SPARK-3948 and squashes the following commits: be0533a [jerryshao] Address the comments a82b184 [jerryshao] add configuration to control the NIO way of copying stream e17ada2 [jerryshao] Fix kernel 2.6.32 bug led unexpected behavior of transferTo (cherry picked from commit c7aeecd08fd329085760fa89025ec0d9c04f5e3f) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala 20 October 2014, 17:22:11 UTC
2cd40db [SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport) This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task. 19 October 2014, 07:31:06 UTC
327404d SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are Author: Sean Owen <sowen@cloudera.com> Closes #2805 from srowen/SPARK-3926 and squashes the following commits: ecb78ee [Sean Owen] Fix conflict between java.io.Serializable and use of Scala's Serializable f4717f9 [Sean Owen] Oops, fix compile problem ae1b36f [Sean Owen] Expand to cover Maps returned from other Java API methods as well 51c26c2 [Sean Owen] Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are 18 October 2014, 19:40:55 UTC
0d958f1 [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 vers... ...ion). This is a backport of SPARK-3606 to branch-1.1. Some of the code had to be duplicated since branch-1.1 doesn't have the cleanup work that was done to the Yarn codebase. I don't know whether the version issue in yarn/alpha/pom.xml was intentional, but I couldn't compile the code without fixing it. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #2497 from vanzin/SPARK-3606-1.1 and squashes the following commits: 4fd3c27 [Marcelo Vanzin] Remove unused imports. 75cde8c [Marcelo Vanzin] Scala is weird. b27ebda [Marcelo Vanzin] Review feedback. 72ceafb [Marcelo Vanzin] Undelete needed import. 61162a6 [Marcelo Vanzin] Use separate config for each param instead of json. 3b7205f [Marcelo Vanzin] Review feedback. b3b3e50 [Marcelo Vanzin] [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 version). 17 October 2014, 07:53:15 UTC
35875e9 [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes JobProgressPage could not show Fair Scheduler Pools section sometimes. SparkContext starts webui and then postEnvironmentUpdate. Sometimes JobProgressPage is accessed between webui starting and postEnvironmentUpdate, then the lazy val isFairScheduler will be false. The Fair Scheduler Pools section will not display any more. Author: yantangzhai <tyz0303@163.com> Author: YanTangZhai <hakeemzhai@tencent.com> Closes #1966 from YanTangZhai/SPARK-3067 and squashes the following commits: d4323f8 [yantangzhai] update [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes 8a00106 [YanTangZhai] Merge pull request #6 from apache/master b6391cc [yantangzhai] revert [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes d2226cd [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cbcba66 [YanTangZhai] Merge pull request #3 from apache/master aac7f7b [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit dedace83f35cba0f833d962acbd75572318948c4) Signed-off-by: Andrew Or <andrewor14@gmail.com> 17 October 2014, 02:27:32 UTC
2c41170 [SPARK-3890][Docs]remove redundant spark.executor.memory in doc Introduced in https://github.com/pwendell/spark/commit/f7e79bc42c1635686c3af01eef147dae92de2529, I'm not sure why we need two spark.executor.memory here. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #2745 from WangTaoTheTonic/redundantconfig and squashes the following commits: e7564dc [WangTao] too long line fdbdb1f [WangTaoTheTonic] trivial workaround d06b6e5 [WangTaoTheTonic] remove redundant spark.executor.memory in doc (cherry picked from commit e7f4ea8a52f0d3d56684b4f9caadce978eac4816) Signed-off-by: Andrew Or <andrewor14@gmail.com> 17 October 2014, 02:13:06 UTC
61e5903 [SQL]typo in HiveFromSpark Author: Kun Li <jacky.likun@gmail.com> Closes #2809 from jackylk/patch-1 and squashes the following commits: 46c926b [Kun Li] typo in HiveFromSpark (cherry picked from commit be2ec4a91d14f48e6323989fb0e0226a9d65bf7e) Signed-off-by: Andrew Or <andrewor14@gmail.com> 17 October 2014, 02:00:19 UTC
925e22d SPARK-3807: SparkSql does not work for tables created using custom serde SparkSql crashes on selecting tables using custom serde. Example: ---------------- CREATE EXTERNAL TABLE table_name PARTITIONED BY ( a int) ROW FORMAT 'SERDE "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" with serdeproperties("serialization.format"="org.apache.thrift.protocol.TBinaryProtocol","serialization.class"="ser_class") STORED AS SEQUENCEFILE; The following exception is seen on running a query like 'select * from table_name limit 1': ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80) at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86) at org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:100) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NullPointerException Author: chirag <chirag.aggarwal@guavus.com> Closes #2674 from chiragaggarwal/branch-1.1 and squashes the following commits: 370c31b [chirag] SPARK-3807: Add a test case to validate the fix. 1f26805 [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde (Incorporated Review Comments) ba4bc0c [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde 5c73b72 [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde 13 October 2014, 20:47:26 UTC
4fc6638 [SPARK-3899][Doc]fix wrong links in streaming doc There are three [Custom Receiver Guide] links in streaming doc, the first is wrong. Author: w00228970 <wangfei1@huawei.com> Author: wangfei <wangfei1@huawei.com> Closes #2749 from scwf/streaming-doc and squashes the following commits: 0cd76b7 [wangfei] update link tojump to the Akka-specific section 45b0646 [w00228970] wrong link in streaming doc (cherry picked from commit 92e017fb894be1e8e2b2b5274fec4c31a7a4412e) Signed-off-by: Josh Rosen <joshrosen@apache.org> 13 October 2014, 06:36:09 UTC
a36116c [SPARK-3905][Web UI]The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect Author: GuoQiang Li <witgo@qq.com> Closes #2763 from witgo/SPARK-3905 and squashes the following commits: 17d7990 [GuoQiang Li] The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect (cherry picked from commit b4a7fa7a663c462bf537ca9d63af0dba6b4a8033) Signed-off-by: Josh Rosen <joshrosen@apache.org> 13 October 2014, 05:49:12 UTC
0e32579 [SPARK-3121] Wrong implementation of implicit bytesWritableConverter val path = ... //path to seq file with BytesWritable as type of both key and value val file = sc.sequenceFile[Array[Byte],Array[Byte]](path) file.take(1)(0)._1 This prints incorrect content of byte array. Actual content starts with correct one and some "random" bytes and zeros are appended. BytesWritable has two methods: getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values copyBytes() - return just begining of internal array determined by internal length property It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes. dbtsai Author: Jakub Dubovský <james64@inMail.sk> Author: Dubovsky Jakub <dubovsky@avast.com> Closes #2712 from james64/3121-bugfix and squashes the following commits: f85d24c [Jakub Dubovský] Test name changed, comments added 1b20d51 [Jakub Dubovský] Import placed correctly 406e26c [Jakub Dubovský] Scala style fixed f92ffa6 [Dubovsky Jakub] performance tuning 480f9cd [Dubovsky Jakub] Bug 3121 fixed (cherry picked from commit fc616d51a510f82627b5be949a5941419834cf70) Signed-off-by: Josh Rosen <joshrosen@apache.org> 13 October 2014, 05:03:50 UTC
5a21e3e SPARK-3716 [GraphX] Update Analytics.scala for partitionStrategy assignment Previously, when the val partitionStrategy was created it called a function in the Analytics object which was a copy of the PartitionStrategy.fromString() method. This function has been removed, and the assignment of partitionStrategy now uses the PartitionStrategy.fromString method instead. In this way, it better matches the declarations of edge/vertex StorageLevel variables. Author: NamelessAnalyst <NamelessAnalyst@users.noreply.github.com> Closes #2569 from NamelessAnalyst/branch-1.1 and squashes the following commits: c24ff51 [NamelessAnalyst] Update Analytics.scala 12 October 2014, 21:18:55 UTC
18ef22a [SPARK-3711][SQL] Optimize where in clause filter queries The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list. Author: Yash Datta <Yash.Datta@guavus.com> Closes #2561 from saucam/branch-1.1 and squashes the following commits: 4bf2d19 [Yash Datta] SPARK-3711: 1. Fix code style and import order 2. Fix optimization condition 3. Add tests for null in filter list 4. Add test case that optimization is not triggered in case of attributes in filter list afedbcd [Yash Datta] SPARK-3711: 1. Add test cases for InSet class in ExpressionEvaluationSuite 2. Add class OptimizedInSuite on the lines of ConstantFoldingSuite, for the optimized In clause 0fc902f [Yash Datta] SPARK-3711: UnaryMinus will be handled by constantFolding bd84c67 [Yash Datta] SPARK-3711: Incorporate review comments. Move optimization of In clause to Optimizer.scala by adding a rule. Add appropriate comments 430f5d1 [Yash Datta] SPARK-3711: Optimize the filter list in case of negative values as well bee98aa [Yash Datta] SPARK-3711: Optimize where in clause filter queries 09 October 2014, 20:08:35 UTC
09d6a81 [SPARK-3844][UI] Truncate appName in WebUI if it is too long Truncate appName in WebUI if it is too long. Author: Xiangrui Meng <meng@databricks.com> Closes #2707 from mengxr/truncate-app-name and squashes the following commits: 87834ce [Xiangrui Meng] move scala import below java c7111dc [Xiangrui Meng] truncate appName in WebUI if it is too long (cherry picked from commit 86b392942daf61fed2ff7490178b128107a0e856) Signed-off-by: Andrew Or <andrewor14@gmail.com> 09 October 2014, 07:00:32 UTC
a44af73 [SPARK-3788] [yarn] Fix compareFs to do the right thing for HDFS namespaces (1.1 version). HA and viewfs use namespaces instead of host names, so you can't resolve them since that will fail. So be smarter to avoid doing unnecessary work. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #2650 from vanzin/SPARK-3788-1.1 and squashes the following commits: 174bf71 [Marcelo Vanzin] Update comment. 0e36be7 [Marcelo Vanzin] Use Objects.equal() instead of ==. 772aead [Marcelo Vanzin] [SPARK-3788] [yarn] Fix compareFs to do the right thing for HA, federation (1.1 version). 08 October 2014, 13:51:17 UTC
a1f833f [SPARK-3829] Make Spark logo image on the header of HistoryPage as a link to HistoryPage's page #1 There is a Spark logo on the header of HistoryPage. We can have too many HistoryPages if we run 20+ applications. So I think, it's useful if the logo is as a link to the HistoryPage's page number 1. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2690 from sarutak/SPARK-3829 and squashes the following commits: 908c109 [Kousuke Saruta] Removed extra space. 00bfbd7 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3829 dd87480 [Kousuke Saruta] Made header Spark log image as a link to History Server's top page. (cherry picked from commit b69c9fb6fb048509bbd8430fb697dc3a5ca4fe59) Signed-off-by: Andrew Or <andrewor14@gmail.com> 07 October 2014, 23:55:11 UTC
e8afb73 [SPARK-3777] Display "Executor ID" for Tasks in Stage page Now the Stage page only displays "Executor"(host) for tasks. However, there may be more than one Executors running in the same host. Currently, when some task is hung, I only know the host of the faulty executor. Therefore I have to check all executors in the host. Adding "Executor ID" in the Tasks table. would be helpful to locate the faulty executor. Here is the new page: ![add_executor_id_for_tasks](https://cloud.githubusercontent.com/assets/1000778/4505774/acb9648c-4afa-11e4-8826-8768a0a60cc9.png) Author: zsxwing <zsxwing@gmail.com> Closes #2642 from zsxwing/SPARK-3777 and squashes the following commits: 37945af [zsxwing] Put Executor ID and Host into one cell 4bbe2c7 [zsxwing] [SPARK-3777] Display "Executor ID" for Tasks in Stage page (cherry picked from commit 446063eca98ae56d1ac61415f4c6e89699b8db02) Signed-off-by: Andrew Or <andrewor14@gmail.com> 07 October 2014, 23:00:31 UTC
5531830 [SPARK-3731] [PySpark] fix memory leak in PythonRDD The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally. Author: Davies Liu <davies.liu@gmail.com> Closes #2668 from davies/leak and squashes the following commits: ae98be2 [Davies Liu] fix memory leak in PythonRDD (cherry picked from commit bc87cc410fae59660c13b6ae1c14204df77237b8) Signed-off-by: Josh Rosen <joshrosen@apache.org> Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 07 October 2014, 20:05:29 UTC
267c7be [SPARK-3825] Log more detail when unrolling a block fails Before: ``` 14/10/06 16:45:42 WARN CacheManager: Not enough space to cache partition rdd_0_2 in memory! Free memory is 481861527 bytes. ``` After: ``` 14/10/07 11:08:24 WARN MemoryStore: Not enough space to cache rdd_2_0 in memory! (computed 68.8 MB so far) 14/10/07 11:08:24 INFO MemoryStore: Memory use = 1088.0 B (blocks) + 445.1 MB (scratch space shared across 8 thread(s)) = 445.1 MB. Storage limit = 459.5 MB. ``` Author: Andrew Or <andrewor14@gmail.com> Closes #2688 from andrewor14/cache-log-message and squashes the following commits: 28e33d6 [Andrew Or] Shy away from "unrolling" 5638c49 [Andrew Or] Grammar 39a0c28 [Andrew Or] Log more detail when unrolling a block fails (cherry picked from commit 553737c6e6d5ffa3b52a9888444f4beece5c5b1a) Signed-off-by: Andrew Or <andrewor14@gmail.com> 07 October 2014, 19:52:27 UTC
3a7875d [SPARK-3808] PySpark fails to start in Windows Modified syntax error of *.cmd script. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #2669 from tsudukim/feature/SPARK-3808 and squashes the following commits: 7f804e6 [Masayoshi TSUZUKI] [SPARK-3808] PySpark fails to start in Windows (cherry picked from commit 12e2551ea1773ae19559ecdada35d23608e6b0ec) Signed-off-by: Andrew Or <andrewor14@gmail.com> 07 October 2014, 18:53:42 UTC
82ab4a7 [SPARK-3827] Very long RDD names are not rendered properly in web UI With Spark SQL we generate very long RDD names. These names are not properly rendered in the web UI. This PR fixes the rendering issue. [SPARK-3827] #comment Linking PR with JIRA Author: Hossein <hossein@databricks.com> Closes #2687 from falaki/sparkTableUI and squashes the following commits: fd06409 [Hossein] Limit width of cell when RDD name is too long (cherry picked from commit d65fd554b4de1dbd8db3090b0e50994010d30e78) Signed-off-by: Josh Rosen <joshrosen@apache.org> 07 October 2014, 18:46:43 UTC
964e3aa [SPARK-3792][SQL] Enable JavaHiveQLSuite Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two SparkContexts in one jvm and enable JavaHiveQLSuite Author: scwf <wangfei1@huawei.com> Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits: be35c91 [scwf] enable JavaHiveQLSuite (cherry picked from commit 58f5361caaa2f898e38ae4b3794167881e20a818) Signed-off-by: Michael Armbrust <michael@databricks.com> 06 October 2014, 00:50:11 UTC
c068d90 SPARK-1656: Fix potential resource leaks JIRA: https://issues.apache.org/jira/browse/SPARK-1656 Author: zsxwing <zsxwing@gmail.com> Closes #577 from zsxwing/SPARK-1656 and squashes the following commits: c431095 [zsxwing] Add a comment and fix the code style 2de96e5 [zsxwing] Make sure file will be deleted if exception happens 28b90dc [zsxwing] Update to follow the code style 4521d6e [zsxwing] Merge branch 'master' into SPARK-1656 afc3383 [zsxwing] Update to follow the code style 071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks (cherry picked from commit a7c73130f1b6b0b8b19a7b0a0de5c713b673cd7b) Signed-off-by: Andrew Or <andrewor14@gmail.com> 05 October 2014, 16:56:32 UTC
d9cf4d0 [SPARK-3597][Mesos] Implement `killTask`. The MesosSchedulerBackend did not previously implement `killTask`, resulting in an exception. Author: Brenden Matthews <brenden@diddyinc.com> Closes #2453 from brndnmtthws/implement-killtask and squashes the following commits: 23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`. (cherry picked from commit 32fad4233f353814496c84e15ba64326730b7ae7) Signed-off-by: Andrew Or <andrewor14@gmail.com> 05 October 2014, 16:49:35 UTC
e4ddede [SPARK-3774] typo comment in bin/utils.sh Modified the comment of bin/utils.sh. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #2639 from tsudukim/feature/SPARK-3774 and squashes the following commits: 707b779 [Masayoshi TSUZUKI] [SPARK-3774] typo comment in bin/utils.sh (cherry picked from commit e5566e05b1ac99aa6caf1701e47ebcdb68a002c6) Signed-off-by: Andrew Or <andrewor14@gmail.com> 03 October 2014, 20:12:45 UTC
f130256 [SPARK-3775] Not suitable error message in spark-shell.cmd Modified some sentence of error message in bin\*.cmd. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits: 3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd (cherry picked from commit 358d7ffd01b4a3fbae313890522cf662c71af6e5) Signed-off-by: Andrew Or <andrewor14@gmail.com> 03 October 2014, 20:10:13 UTC
6f15097 [SPARK-3535][Mesos] Fix resource handling. Author: Brenden Matthews <brenden@diddyinc.com> Closes #2401 from brndnmtthws/master and squashes the following commits: 4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling. (cherry picked from commit a8c52d5343e19731909e73db5de151a324d31cd5) Signed-off-by: Andrew Or <andrewor14@gmail.com> 03 October 2014, 19:58:17 UTC
d5af9e1 [SPARK-3696]Do not override the user-difined conf_dir https://issues.apache.org/jira/browse/SPARK-3696 We see if SPARK_CONF_DIR is already defined before assignment. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2541 from WangTaoTheTonic/confdir and squashes the following commits: c3f31e0 [WangTaoTheTonic] Do not override the user-difined conf_dir (cherry picked from commit 9d320e222c221e5bb827cddf01a83e64a16d74ff) Signed-off-by: Andrew Or <andrewor14@gmail.com> Conflicts: sbin/spark-config.sh 03 October 2014, 17:47:47 UTC
5d991db SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR Update of PR #997. With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env). Author: EugenCepoi <cepoi.eugen@gmail.com> Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits: 0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd 77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR (cherry picked from commit f0811f928e5b608e1a2cba3b6828ba0ed03b701d) Signed-off-by: Andrew Or <andrewor14@gmail.com> 03 October 2014, 17:03:24 UTC
back to top