Revision history - refs/tags/v0.9.2 - origin: https://github.com/apache/spark

visit type:

Revision	Author	Date	Message	Commit Date
4322c0b	Xiangrui Meng	17 July 2014, 07:48:28 UTC	[maven-release-plugin] prepare release v0.9.2-rc1	17 July 2014, 07:48:28 UTC
351f35e	Xiangrui Meng	17 July 2014, 07:02:43 UTC	[branch-0.9] Update CHANGES.txt Author: Xiangrui Meng <meng@databricks.com> Closes #1459 from mengxr/v0.9.2-rc and squashes the following commits: 6fa5a65 [Xiangrui Meng] update CHANGES.txt	17 July 2014, 07:02:43 UTC
c9a22e8	Xiangrui Meng	17 July 2014, 06:53:40 UTC	[branch-0.9] bump versions for v0.9.2 release candidate Manually update some version numbers. Author: Xiangrui Meng <meng@databricks.com> Closes #1458 from mengxr/v0.9.2-rc and squashes the following commits: 2c38419 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-0.9' into v0.9.2-rc 7d0fb76 [Xiangrui Meng] change tree/master to tree/branch-0.9 in docs ea2b205 [Xiangrui Meng] update version in SparkBuild 162af66 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-0.9' into v0.9.2-rc bc87035 [Xiangrui Meng] bump version numbers to 0.9.2	17 July 2014, 06:53:40 UTC
60f4b3b	Xiangrui Meng	17 July 2014, 06:39:02 UTC	[branch-0.9] Fix github links in docs We moved example code in v1.0. The links are no longer valid if still pointing to `tree/master`. Author: Xiangrui Meng <meng@databricks.com> Closes #1456 from mengxr/links-0.9 and squashes the following commits: b7b9260 [Xiangrui Meng] change tree/master to tree/branch-0.9 in docs	17 July 2014, 06:39:02 UTC
7edee34	Patrick Wendell	17 July 2014, 04:30:50 UTC	[SPARK-1112, 2156] (0.9 edition) Use correct akka frame size and overhead amounts. backport #1172 to branch-0.9. Author: Patrick Wendell <pwendell@gmail.com> Closes #1455 from mengxr/akka-fix-0.9 and squashes the following commits: a99f201 [Patrick Wendell] backport PR #1172 to branch-0.9	17 July 2014, 04:30:50 UTC
0116dee	Xiangrui Meng	17 July 2014, 03:12:09 UTC	[SPARK-2433][MLLIB] fix NaiveBayesModel.predict This is the same as https://github.com/apache/spark/pull/463 , which I forgot to merge into branch-0.9. Author: Xiangrui Meng <meng@databricks.com> Closes #1453 from mengxr/nb-transpose-0.9 and squashes the following commits: bc53ce8 [Xiangrui Meng] fix NaiveBayes	17 July 2014, 03:12:09 UTC
8e5604b	Gabriele Nizzoli	08 July 2014, 21:23:38 UTC	[SPARK-2362] Fix for newFilesOnly logic in file DStream The newFilesOnly logic should be inverted: the logic should be that if the flag newFilesOnly==true then only start reading files older than current time. As the code is now if newFilesOnly==true then it will start to read files that are older than 0L (that is: every file in the directory). Author: Gabriele Nizzoli <mail@nizzoli.net> Closes #1077 from gabrielenizzoli/master and squashes the following commits: 4f1d261 [Gabriele Nizzoli] Fix for newFilesOnly logic in file DStream (cherry picked from commit e6f7bfcfbf6aff7a9f8cd8e0a2166d0bf62b0912) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	08 July 2014, 21:25:33 UTC
57873ef	Aaron Davidson	04 July 2014, 06:02:36 UTC	SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark JIRA: https://issues.apache.org/jira/browse/SPARK-2282 This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes. This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets. Author: Aaron Davidson <aaron@databricks.com> Closes #1220 from aarondav/SPARK-2282 and squashes the following commits: 2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark (cherry picked from commit 97a0bfe1c0261384f09d53f9350de52fb6446d59) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	04 July 2014, 06:02:56 UTC
c37e9ed	Aaron Davidson	04 July 2014, 05:31:41 UTC	[SPARK-2350] Don't NPE while launching drivers Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe. Author: Aaron Davidson <aaron@databricks.com> Closes #1289 from aarondav/master-fail and squashes the following commits: 1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers (cherry picked from commit 586feb5c9528042420f678f78bacb6c254a5eaf8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	04 July 2014, 06:00:30 UTC
0d3d5ce	John Zhao	03 July 2014, 22:17:51 UTC	[SPARK-1516]Throw exception in yarn client instead of run system.exit [SPARK-1516]Throw exception in yarn client instead of run system.exit directly. All the changes is in the package of "org.apache.spark.deploy.yarn": 1) Throw IllegalArgumentException in ClinetArguments instead of exit directly. 2) In Client's main method, if exception is caught, it will exit with code 1, otherwise exit with code 0. 3) In YarnClientSchedulerBackend's start method, if IllegalArgumentException is caught, it will exit with code 1, otherwise throw that exception. 4) Fix some message typo in the Client.scala After the fix, if user integrate the spark yarn client into their applications, when the argument is wrong or the running is finished, the application won't be terminated. +CC dbtsai mengxr Author: John Zhao <codeboyyong@gmail.com> Closes #1099 from codeboyyong/branch-0.9 and squashes the following commits: 00144b5 [John Zhao] use e.printStackTrace() to replace "Console.err.println(e.getMessage)" so that client console can get more useful information when somthing is wrong. addcecb [John Zhao] [SPARK-1516]Throw exception in yarn client instead of run system.exit directly.	03 July 2014, 22:17:51 UTC
b3f4245	Patrick Wendell	28 June 2014, 01:19:16 UTC	HOTFIX: Removing out dated python path in testing tool.	28 June 2014, 01:19:16 UTC
9509819	Wenchen Fan(Cloud)	03 June 2014, 20:18:20 UTC	[SPARK-1912] fix compress memory issue during reduce When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily. Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com> Closes #860 from cloud-fan/fix-compress and squashes the following commits: 0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator' 07f32c2 [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class 2c8adb2 [Wenchen Fan(Cloud)] add inline comment 8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce	25 June 2014, 20:32:35 UTC
ef8501d	Ori Kremer	23 June 2014, 03:21:23 UTC	SPARK-2241: quote command line args in ec2 script To preserve quoted command line args (in case options have space in them). Author: Ori Kremer <ori.kremer@gmail.com> Closes #1169 from orikremer/quote_cmd_line_args and squashes the following commits: 67e2aa1 [Ori Kremer] quote command line args (cherry picked from commit 9fc373e3a9a8ba7bea9df0950775f48918f63a8a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	23 June 2014, 03:24:20 UTC
2a2eace	Patrick Wendell	17 June 2014, 22:09:24 UTC	HOTFIX: bug caused by #941 This patch should have qualified the use of PIPE. This needs to be back ported into 0.9 and 1.0. Author: Patrick Wendell <pwendell@gmail.com> Closes #1108 from pwendell/hotfix and squashes the following commits: 711c58d [Patrick Wendell] HOTFIX: bug caused by #941 (cherry picked from commit b2ebf429e24566c29850c570f8d76943151ad78c) Signed-off-by: Xiangrui Meng <meng@databricks.com>	17 June 2014, 22:09:55 UTC
8e9f479	Anant	17 June 2014, 06:42:27 UTC	SPARK-1990: added compatibility for python 2.6 for ssh_read command https://issues.apache.org/jira/browse/SPARK-1990 There were some posts on the lists that spark-ec2 does not work with Python 2.6. In addition, we should check the Python version at the top of the script and exit if it's too old Author: Anant <anant.asty@gmail.com> Closes #941 from anantasty/SPARK-1990 and squashes the following commits: 4ca441d [Anant] Implmented check_optput withinthe module to work with python 2.6 c6ed85c [Anant] added compatibility for python 2.6 for ssh_read command Conflicts: ec2/spark_ec2.py	17 June 2014, 06:46:10 UTC
706e38f	joyyoj	11 June 2014, 00:26:17 UTC	[SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not re... flume event sent to Spark will fail if the body is too large and numHeaders is greater than zero Author: joyyoj <sunshch@gmail.com> Closes #951 from joyyoj/master and squashes the following commits: f4660c5 [joyyoj] [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not read properly (cherry picked from commit 29660443077619ee854025b8d0d3d64181724054) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	11 June 2014, 00:26:45 UTC
cc95d97	Thomas Graves	10 June 2014, 06:07:25 UTC	Spark 1384 - Fix spark-shell on yarn access to secure hdfs - branch-0.9 only Author: Thomas Graves <tgraves@apache.org> Closes #287 from tgravescs/SPARK-1384 and squashes the following commits: ae9162a [Thomas Graves] SPARK-1384 - fix spark-shell on yarn access to secure HDFS	10 June 2014, 06:07:25 UTC
1d3aab9	DB Tsai	10 June 2014, 05:56:24 UTC	[SPARK-1870] Made deployment with --jars work in yarn-standalone mode. Ported from 1.0 branch to 0.9 branch. Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Author: DB Tsai <dbtsai@dbtsai.com> Closes #1013 from dbtsai/branch-0.9 and squashes the following commits: c5696f4 [DB Tsai] fix line too long b085f10 [DB Tsai] Make sure that empty string is filtered out when we get secondary jars 3cc1085 [DB Tsai] changed from var to val ab94aa1 [DB Tsai] Code formatting. 0956af9 [DB Tsai] Ported SPARK-1870 from 1.0 branch to 0.9 branch	10 June 2014, 05:56:24 UTC
51f677e	Matei Zaharia	06 June 2014, 06:01:48 UTC	SPARK-2043: ExternalAppendOnlyMap doesn't always find matching keys The current implementation reads one key with the next hash code as it finishes reading the keys with the current hash code, which may cause it to miss some matches of the next key. This can cause operations like join to give the wrong result when reduce tasks spill to disk and there are hash collisions, as values won't be matched together. This PR fixes it by not reading in that next key, using a peeking iterator instead. Author: Matei Zaharia <matei@databricks.com> Closes #986 from mateiz/spark-2043 and squashes the following commits: 0959514 [Matei Zaharia] Added unit test for having many hash collisions 892debb [Matei Zaharia] SPARK-2043: don't read a key with the next hash code in ExternalAppendOnlyMap, instead use a buffered iterator to only read values with the current hash code. (cherry picked from commit b45c13e7d798f97b92f1a6329528191b8d779c4f) Signed-off-by: Matei Zaharia <matei@databricks.com>	06 June 2014, 06:02:11 UTC
6634a34	Varakhedi Sujeet	04 June 2014, 23:01:56 UTC	SPARK-1790: Update EC2 scripts to support r3 instance types Author: Varakhedi Sujeet <svarakhedi@gopivotal.com> Closes #960 from sujeetv/ec2-r3 and squashes the following commits: 3cb9fd5 [Varakhedi Sujeet] SPARK-1790: Update EC2 scripts to support r3 instance (cherry picked from commit 11ded3f66f178e4d8d2b23491dd5e0ea23bcf719) Conflicts: ec2/spark_ec2.py	04 June 2014, 23:06:10 UTC
41e7853	Erik Selin	03 June 2014, 20:31:16 UTC	[SPARK-1468] Modify the partition function used by partitionBy. Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. Associated JIRA at https://issues.apache.org/jira/browse/SPARK-1468 Author: Erik Selin <erik.selin@jadedpixel.com> Closes #371 from tyro89/consistent_hashing and squashes the following commits: 201c301 [Erik Selin] Make partitionBy use a tweaked version of hash as its default partition function since the python hash function does not consistently assign the same value to None across python processes. (cherry picked from commit 8edc9d0330c94b50e01956ae88693cff4e0977b2) Signed-off-by: Matei Zaharia <matei@databricks.com>	03 June 2014, 20:31:37 UTC
e03af41	Uri Laserson	31 May 2014, 21:59:09 UTC	SPARK-1917: fix PySpark import of scipy.special functions https://issues.apache.org/jira/browse/SPARK-1917 Author: Uri Laserson <laserson@cloudera.com> Closes #866 from laserson/SPARK-1917 and squashes the following commits: d947e8c [Uri Laserson] Added test for scipy.special importing 1798bbd [Uri Laserson] SPARK-1917: fix PySpark import of scipy.special Conflicts: python/pyspark/tests.py	31 May 2014, 22:08:00 UTC
563bfe1	Yin Huai	31 May 2014, 05:12:17 UTC	SPARK-1935: Explicitly add commons-codec 1.5 as a dependency (for branch-0.9). This is for branch 0.9. Author: Yin Huai <huai@cse.ohio-state.edu> Closes #912 from yhuai/SPARK-1935-branch-0.9 and squashes the following commits: d7f0f7c [Yin Huai] Explicitly add commons-codec 1.5 as a dependency.	31 May 2014, 05:12:17 UTC
a92900c	Daniel Darabos	02 April 2014, 19:27:37 UTC	SPARK-1188: Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by my measurements. It also simplifies the code. As far as I can tell the object re-use was nothing but premature optimization. I did actual benchmarks for all the included changes, and there is no performance difference. I am not sure where to put the benchmarks. Does Spark not have a benchmark suite? This is an example benchmark I did: test("benchmark") { val builder = new EdgePartitionBuilder[Int] for (i <- (1 to 10000000)) { builder.add(i.toLong, i.toLong, i) } val p = builder.toEdgePartition p.map(_.attr + 1).iterator.toList } It ran for 10 seconds both before and after this change. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #276 from darabos/spark-1188 and squashes the following commits: 574302b [Daniel Darabos] Restore "manual" copying in EdgePartition.map(Iterator). Add comment to discourage novices like myself from trying to simplify the code. 4117a64 [Daniel Darabos] Revert EdgePartitionSuite. 4955697 [Daniel Darabos] Create a copy of the Edge objects in EdgeRDD.compute(). This avoids exposing the object re-use, while still enables the more efficient behavior for internal code. 4ec77f8 [Daniel Darabos] Add comments about object re-use to the affected functions. 2da5e87 [Daniel Darabos] Restore object re-use in EdgePartition. 0182f2b [Daniel Darabos] Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (SPARK-1188) and has no performance impact in my measurements. It also simplifies the code. c55f52f [Daniel Darabos] Tests that reproduce the problems from SPARK-1188. (cherry picked from commit 78236334e4ca7518b6d7d9b38464dbbda854a777) Signed-off-by: Reynold Xin <rxin@apache.org>	29 May 2014, 07:42:55 UTC
aef6390	witgo	28 May 2014, 22:57:05 UTC	[SPARK-1712]: TaskDescription instance is too big causes Spark to hang Author: witgo <witgo@qq.com> Closes #694 from witgo/SPARK-1712_new and squashes the following commits: 0f52483 [witgo] review commit 83ce29b [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 52e6752 [witgo] reset test SparkContext 63636b6 [witgo] review commit 44a59ee [witgo] review commit 3b6d48c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 926bd6a [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 9a5cfad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 03cc562 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new b0930b0 [witgo] review commit b1174bd [witgo] merge master f76679b [witgo] merge master 689495d [witgo] fix scala style bug 1d35c3c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 062c182 [witgo] fix small bug for code style 0a428cf [witgo] add unit tests 158b2dc [witgo] review commit 4afe71d [witgo] review commit 9e4ffa7 [witgo] review commit 1d35c7d [witgo] fix hang 7965580 [witgo] fix Statement order 0e29eac [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 3ea1ca1 [witgo] remove duplicate serialize 743a7ad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 86e2048 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new 2a89adc [witgo] SPARK-1712: TaskDescription instance is too big causes Spark to hang Conflicts: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala	28 May 2014, 23:41:47 UTC
234a378	David Lemieux	28 May 2014, 22:50:35 UTC	Spark 1916 The changes could be ported back to 0.9 as well. Changing in.read to in.readFully to read the whole input stream rather than the first 1020 bytes. This should ok considering that Flume caps the body size to 32K by default. Author: David Lemieux <david.lemieux@radialpoint.com> Closes #865 from lemieud/SPARK-1916 and squashes the following commits: a265673 [David Lemieux] Updated SparkFlumeEvent to read the whole stream rather than the first X bytes. (cherry picked from commit 0b769b73fb7ae314325857138a2d3138ed157908) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	28 May 2014, 22:51:43 UTC
7633949	Patrick Wendell	28 April 2014, 00:40:56 UTC	SPARK-1145: Memory mapping with many small blocks can cause JVM allocation failures This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format. Author: Patrick Wendell <pwendell@gmail.com> Closes #43 from pwendell/block-iter-logging and squashes the following commits: 1cff512 [Patrick Wendell] Small issue from merge. 49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging 4943351 [Patrick Wendell] Added a test and feedback on mateis review a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers. b76b95f [Patrick Wendell] Review feedback 4e1514e [Patrick Wendell] Don't memory map for small files d238b88 [Patrick Wendell] Some logging and clean-up Conflicts: core/src/main/scala/org/apache/spark/storage/BlockManager.scala core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala docs/configuration.md	12 May 2014, 00:39:21 UTC
c9f40d0	Matei Zaharia	11 May 2014, 23:54:54 UTC	Update version to 0.9.2-SNAPSHOT in sbt	11 May 2014, 23:54:54 UTC
bea2be3	Sandeep	09 May 2014, 05:30:17 UTC	SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables. Author: Sandeep <sandeep@techaddict.me> Closes #707 from techaddict/SPARK-1775 and squashes the following commits: 18d8ebf [Sandeep] SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.	10 May 2014, 05:58:32 UTC
9e2c59e	Mark Hamstra	06 May 2014, 19:53:39 UTC	[SPARK-1685] Cancel retryTimer on restart of Worker or AppClient See https://issues.apache.org/jira/browse/SPARK-1685 for a more complete description, but in essence: If the Worker or AppClient actor restarts before successfully registering with Master, multiple retryTimers will be running, which will lead to less than the full number of registration retries being attempted before the new actor is forced to give up. Author: Mark Hamstra <markhamstra@gmail.com> Closes #602 from markhamstra/SPARK-1685 and squashes the following commits: 11cc088 [Mark Hamstra] retryTimer -> registrationRetryTimer 69c348c [Mark Hamstra] Cancel retryTimer on restart of Worker or AppClient	06 May 2014, 19:56:29 UTC
45561cd	Thomas Graves	03 May 2014, 17:59:05 UTC	[WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to https://github.com/apache/spark/pull/607 Author: Thomas Graves <tgraves@apache.org> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up. (cherry picked from commit 3d0a02dff3011e8894d98d903cd086bc95e56807) Signed-off-by: Aaron Davidson <aaron@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala	03 May 2014, 18:13:06 UTC
54c3b7e	Nan Zhu	22 April 2014, 06:42:47 UTC	version number fix self-explain Author: Nan Zhu <CodingCat@users.noreply.github.com> Closes #467 from CodingCat/branch-0.9 and squashes the following commits: ba36109 [Nan Zhu] remove out-dated comments 9a8810e [Nan Zhu] version number fix	22 April 2014, 06:42:47 UTC
9e89789	Patrick Wendell	13 April 2014, 21:32:22 UTC	Small syntax error from previous backport	13 April 2014, 21:32:22 UTC
4a325e1	baishuo(白硕)	12 April 2014, 03:33:42 UTC	Update WindowedDStream.scala update the content of Exception when windowDuration is not multiple of parent.slideDuration Author: baishuo(白硕) <vc_java@hotmail.com> Closes #390 from baishuo/windowdstream and squashes the following commits: 533c968 [baishuo(白硕)] Update WindowedDStream.scala Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala	12 April 2014, 03:40:43 UTC
19cf2f7	Tathagata Das	08 April 2014, 01:27:46 UTC	Fixed typo on Spark quick-start docs.	08 April 2014, 01:27:46 UTC
69fc97d	Davis Shepherd	07 April 2014, 17:02:00 UTC	SPARK-1432: Make sure that all metadata fields are properly cleaned While working on spark-1337 with @pwendell, we noticed that not all of the metadata maps in JobProgessListener were being properly cleaned. This could lead to a (hypothetical) memory leak issue should a job run long enough. This patch aims to address the issue. Author: Davis Shepherd <davis@conviva.com> Closes #338 from dgshep/master and squashes the following commits: a77b65c [Davis Shepherd] In the contex of SPARK-1337: Make sure that all metadata fields are properly cleaned (cherry picked from commit a3c51c6ea2320efdeb2a6a5c1cd11d714f8994aa) Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala	07 April 2014, 17:04:58 UTC
139fc1a	Matei Zaharia	06 April 2014, 03:52:05 UTC	SPARK-1421. Make MLlib work on Python 2.6 The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well. Author: Matei Zaharia <matei@databricks.com> Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits: f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7 a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6 (cherry picked from commit 0b855167818b9afd2d2aa9f617b9861d77b2425d) Signed-off-by: Matei Zaharia <matei@databricks.com>	06 April 2014, 03:52:22 UTC
d4df076	Thomas Graves	05 April 2014, 01:26:51 UTC	Update documentation for work around for SPARK-1384 This is to workaround accessing secure hdfs from spark-shell in yarn-client mode. Note this only applies to branch-0.9 and is intended to be included in the documentation for 0.9.1. The real fix after 0.9.1 is included in https://github.com/apache/spark/pull/287 Author: Thomas Graves <tgraves@apache.org> Closes #314 from tgravescs/docFix09rc3 and squashes the following commits: 222e848 [Thomas Graves] Update documentation for work around for SPARK-1384	05 April 2014, 01:26:51 UTC
7f727cf	Patrick Wendell	04 April 2014, 05:13:56 UTC	SPARK-1337: Application web UI garbage collects newest stages Simple fix... Author: Patrick Wendell <pwendell@gmail.com> Closes #320 from pwendell/stage-clean-up and squashes the following commits: 29be62e [Patrick Wendell] SPARK-1337: Application web UI garbage collects newest stages instead old ones (cherry picked from commit ee6e9e7d863022304ac9ced405b353b63accb6ab) Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala	04 April 2014, 05:34:21 UTC
d9c7a80	Diana Carroll	03 April 2014, 22:48:42 UTC	[SPARK-1134] Fix and document passing of arguments to IPython This is based on @dianacarroll's previous pull request https://github.com/apache/spark/pull/227, and @joshrosen's comments on https://github.com/apache/spark/pull/38. Since we do want to allow passing arguments to IPython, this does the following: * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see https://github.com/ipython/ipython/pull/5226, but no released version has that fix.) * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`. * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it. This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes https://github.com/ipython/ipython/pull/5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython. @joshrosen you should probably take the final call on this. Author: Diana Carroll <dcarroll@cloudera.com> Closes #294 from mateiz/spark-1134 and squashes the following commits: 747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied (cherry picked from commit a599e43d6e0950f6b6b32150ce264a8c2711470c) Signed-off-by: Matei Zaharia <matei@databricks.com>	03 April 2014, 22:48:51 UTC
28e7643	Prashant Sharma	03 April 2014, 22:42:17 UTC	Spark 1162 Implemented takeOrdered in pyspark. Since python does not have a library for max heap and usual tricks like inverting values etc.. does not work for all cases. We have our own implementation of max heap. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #97 from ScrapCodes/SPARK-1162/pyspark-top-takeOrdered2 and squashes the following commits: 35f86ba [Prashant Sharma] code review 2b1124d [Prashant Sharma] fixed tests e8a08e2 [Prashant Sharma] Code review comments. 49e6ba7 [Prashant Sharma] SPARK-1162 added takeOrdered to pyspark (cherry picked from commit c1ea3afb516c204925259f0928dfb17d0fa89621) Signed-off-by: Matei Zaharia <matei@databricks.com>	03 April 2014, 22:42:31 UTC
a6c955a	Nick Lanham	28 March 2014, 20:33:35 UTC	fix path for jar, make sed actually work on OSX Author: Nick Lanham <nick@afternight.org> Closes #264 from nicklan/make-distribution-fixes and squashes the following commits: 172b981 [Nick Lanham] fix path for jar, make sed actually work on OSX (cherry picked from commit 75d46be5d61fb92a6db2efb9e3a690716ef521d3) Signed-off-by: Matei Zaharia <matei@databricks.com>	28 March 2014, 20:33:44 UTC
4afbd19	Nick Lanham	28 March 2014, 05:45:00 UTC	Make sed do -i '' on OSX I don't have access to an OSX machine, so if someone could test this that would be great. Author: Nick Lanham <nick@afternight.org> Closes #258 from nicklan/osx-sed-fix and squashes the following commits: a6f158f [Nick Lanham] Also make mktemp work on OSX 558fd6e [Nick Lanham] Make sed do -i '' on OSX (cherry picked from commit 632c322036b123c6f72e0c8b87d50e08bec3a1ab) Signed-off-by: Matei Zaharia <matei@databricks.com>	28 March 2014, 05:45:14 UTC
3470af3	Tathagata Das	27 March 2014, 05:14:59 UTC	[maven-release-plugin] prepare for next development iteration	27 March 2014, 05:14:59 UTC
4c43182	Tathagata Das	27 March 2014, 05:14:46 UTC	[maven-release-plugin] prepare release v0.9.1-rc3	27 March 2014, 05:14:46 UTC
348f54b	Tathagata Das	27 March 2014, 04:56:17 UTC	Updated CHANGES.txt	27 March 2014, 04:56:17 UTC
ea5da04	Tathagata Das	27 March 2014, 04:53:07 UTC	Revert "[maven-release-plugin] prepare release v0.9.1-rc2" This reverts commit 1197280acf1322165301259dd825f44e22a323bc.	27 March 2014, 04:53:07 UTC
d16e863	Tathagata Das	27 March 2014, 04:51:40 UTC	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 1f785d46e4e3df716dc836e38933dc0a30028496.	27 March 2014, 04:51:40 UTC
4901604	Xiangrui Meng	27 March 2014, 02:30:20 UTC	[SPARK-1327] GLM needs to check addIntercept for intercept and weights GLM needs to check addIntercept for intercept and weights. The current implementation always uses the first weight as intercept. Added a test for training without adding intercept. JIRA: https://spark-project.atlassian.net/browse/SPARK-1327 Author: Xiangrui Meng <meng@databricks.com> Closes #236 from mengxr/glm and squashes the following commits: bcac1ac [Xiangrui Meng] add two tests to ensure {Lasso, Ridge}.setIntercept will throw an exceptions a104072 [Xiangrui Meng] remove protected to be compatible with 0.9 0e57aa4 [Xiangrui Meng] update Lasso and RidgeRegression to parse the weights correctly from GLM mark createModel protected mark predictPoint protected d7f629f [Xiangrui Meng] fix a bug in GLM when intercept is not used (cherry picked from commit d679843a39bb4918a08a5aebdf113ac8886a5275) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	27 March 2014, 02:30:57 UTC
2f90dc5	Prashant Sharma	26 March 2014, 16:16:37 UTC	SPARK-1322, top in pyspark should sort result in descending order. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #235 from ScrapCodes/SPARK-1322/top-rev-sort and squashes the following commits: f316266 [Prashant Sharma] Minor change in comment. 58e58c6 [Prashant Sharma] SPARK-1322, top in pyspark should sort result in descending order.	26 March 2014, 18:15:02 UTC
1f785d4	Ubuntu	26 March 2014, 09:26:45 UTC	[maven-release-plugin] prepare for next development iteration	26 March 2014, 09:26:45 UTC
1197280	Ubuntu	26 March 2014, 09:26:40 UTC	[maven-release-plugin] prepare release v0.9.1-rc2	26 March 2014, 09:26:40 UTC
7495dba	Tathagata Das	26 March 2014, 09:10:57 UTC	Updated CHANGES.txt	26 March 2014, 09:10:57 UTC
da87240	Tathagata Das	26 March 2014, 04:35:36 UTC	[SPARK-782] Made Spark use existing shaded ASM and removed Spark's ASM dependency This ports the changes in #100 to branch 0.9. However, unlike that PR, it does not exclude ASM from all dependencies of Spark, to ensure compatibility in branch 0.9. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #232 from tdas/asm and squashes the following commits: 999bb6f [Tathagata Das] Made Spark use existing shaded ASM and removed Spark's ASM depedency.	26 March 2014, 04:35:36 UTC
55abe72	Tathagata Das	25 March 2014, 22:01:52 UTC	Revert "[maven-release-plugin] prepare release v0.9.1-rc1" This reverts commit 81c6a06c796a87aaeb5f129f36e4c3396e27d652.	25 March 2014, 22:01:52 UTC
b94f997	Tathagata Das	25 March 2014, 22:01:36 UTC	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 12e237e4bc8a307db3b823d8705cb1e28569c461.	25 March 2014, 22:01:36 UTC
12e237e	Ubuntu	24 March 2014, 06:56:16 UTC	[maven-release-plugin] prepare for next development iteration	24 March 2014, 06:56:16 UTC
81c6a06	Ubuntu	24 March 2014, 06:56:10 UTC	[maven-release-plugin] prepare release v0.9.1-rc1	24 March 2014, 06:56:10 UTC
60ddb34	Tathagata Das	24 March 2014, 06:31:59 UTC	Removed all occurences of incubator from all the pom.xml.	24 March 2014, 06:31:59 UTC
f176b03	Tathagata Das	23 March 2014, 20:16:50 UTC	Updated CHANGES.txt file.	23 March 2014, 20:16:50 UTC
5e7ac0d	Emtiaz Ahmed	22 March 2014, 01:05:53 UTC	Fix to Stage UI to display numbers on progress bar Fixes an issue on Stage UI to display numbers on progress bar which are today hidden behind the progress bar div. Please refer to the attached images to see the issue. ![screen shot 2014-03-21 at 4 48 46 pm](https://f.cloud.github.com/assets/563652/2489083/8c127e80-b153-11e3-807c-048ebd45104b.png) ![screen shot 2014-03-21 at 4 49 00 pm](https://f.cloud.github.com/assets/563652/2489084/8c12cf5c-b153-11e3-8747-9d93ff6fceb4.png) Author: Emtiaz Ahmed <emtiazahmed@gmail.com> Closes #201 from emtiazahmed/master and squashes the following commits: a7964fe [Emtiaz Ahmed] Fix to Stage UI to display numbers on progress bar (cherry picked from commit 646e55405b433fdedc9601dab91f99832b641f87) Signed-off-by: Aaron Davidson <aaron@databricks.com>	22 March 2014, 01:07:05 UTC
8856076	zsxwing	21 March 2014, 23:07:22 UTC	SPARK-1284: Fix improper use of SimpleDateFormat `SimpleDateFormat` is not thread-safe. Some places use the same SimpleDateFormat object without safeguard in the multiple threads. It will cause that the Web UI displays improper date. This PR creates a new `SimpleDateFormat` every time when it's necessary. Another solution is using `ThreadLocal` to store a `SimpleDateFormat` in each thread. If this PR impacts the performance, I can change to the latter one. Author: zsxwing <zsxwing@gmail.com> Closes #179 from zsxwing/SPARK-1278 and squashes the following commits: 21fabd3 [zsxwing] SPARK-1278: Fix improper use of SimpleDateFormat Conflicts: core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala core/src/main/scala/org/apache/spark/util/FileLogger.scala	21 March 2014, 23:39:23 UTC
d68549e	Xiangrui Meng	21 March 2014, 21:35:32 UTC	[SPARK-1273] use doi links in mllib-guide Author: Xiangrui Meng <meng@databricks.com> Closes #198 from mengxr/branch-0.9 and squashes the following commits: 39c74ff [Xiangrui Meng] use doi links in mllib-guide	21 March 2014, 21:35:32 UTC
8b1e793	Tathagata Das	21 March 2014, 01:02:55 UTC	Removed incubating from Spark version in all the pom.xml.	21 March 2014, 01:02:55 UTC
8a882ef	Tathagata Das	20 March 2014, 23:55:35 UTC	Bumped versions to Spark 0.9.1 Self explanatory! Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #190 from tdas/branch-0.9-version-bump and squashes the following commits: 34576ee [Tathagata Das] Bumped versions to Spark 0.9.1	20 March 2014, 23:55:35 UTC
c6630d3	Sandy Ryza	28 February 2014, 15:40:47 UTC	SPARK-1032. If Yarn app fails before registering, app master stays aroun... ...d long after This reopens https://github.com/apache/incubator-spark/pull/648 against the new repo. Author: Sandy Ryza <sandy@cloudera.com> Closes #28 from sryza/sandy-spark-1032 and squashes the following commits: 5953f50 [Sandy Ryza] SPARK-1032. If Yarn app fails before registering, app master stays around long after Conflicts: yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala	20 March 2014, 21:50:44 UTC
748f002	Sandy Ryza	28 February 2014, 18:43:01 UTC	SPARK-1051. On YARN, executors don't doAs submitting user This reopens https://github.com/apache/incubator-spark/pull/538 against the new repo Author: Sandy Ryza <sandy@cloudera.com> Closes #29 from sryza/sandy-spark-1051 and squashes the following commits: 708ce49 [Sandy Ryza] SPARK-1051. doAs submitting user in YARN	20 March 2014, 19:48:05 UTC
1e36690	Aaron Kimball	20 March 2014, 19:27:47 UTC	[SPARK-1285] Backporting updates to streaming docs to branch 0.9 Cherrypicked updates that have been added to master branch Author: Aaron Kimball <aaron@magnify.io> Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Chen Chao <crazyjvm@gmail.com> Author: Andrew Or <andrewor14@gmail.com> Closes #183 from tdas/branch-0.9-streaming-docs and squashes the following commits: e1a988f [Tathagata Das] Added clean to run-tests 98c3e98 [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-0.9' into branch-0.9-streaming-docs d792351 [Chen Chao] maintain arbitrary state data for each key e708f74 [Aaron Kimball] SPARK-1173. (#2) Fix typo in Java streaming example. 156bcd7 [Aaron Kimball] SPARK-1173. Improve scala streaming docs. 8849a96 [Andrew Or] Fix typos in Spark Streaming programming guide fbd66a5 [Chen Chao] Merge pull request #579 from CrazyJvm/patch-1.	20 March 2014, 19:27:47 UTC
1cc979e	Xiangrui Meng	20 March 2014, 02:05:26 UTC	[SPARK-1273] MLlib bug fixes, improvements, and doc updates for v0.9.1 Cherry-picked a few MLlib commits that are bug fixes, optimization, or doc updates for the v0.9.1 release. JIRA: https://spark-project.atlassian.net/browse/SPARK-1273 Author: Xiangrui Meng <meng@databricks.com> Author: Sean Owen <sowen@cloudera.com> Author: Andrew Tulloch <andrew@tullo.ch> Author: Chen Chao <crazyjvm@gmail.com> Closes #175 from mengxr/branch-0.9 and squashes the following commits: d8928ea [Xiangrui Meng] add Apache header to LocalSparkContext a66d386 [Xiangrui Meng] Merge remote-tracking branch 'apache/branch-0.9' into branch-0.9 a899894 [Xiangrui Meng] [SPARK-1237, 1238] Improve the computation of YtY for implicit ALS 46fe493 [Xiangrui Meng] [SPARK-1260]: faster construction of features with intercept 6340a18 [Sean Owen] MLLIB-22. Support negative implicit input in ALS f27441a [Chen Chao] MLLIB-24: url of "Collaborative Filtering for Implicit Feedback Datasets" in ALS is invalid now a26ac90 [Sean Owen] Merge pull request #460 from srowen/RandomInitialALSVectors 0564985 [Andrew Tulloch] Fixed import order 2512e67 [Andrew Tulloch] LocalSparkContext for MLlib	20 March 2014, 02:05:26 UTC
a4eef65	Tathagata Das	19 March 2014, 23:10:45 UTC	[SPARK-1275] Made dev/run-tests executable. This was causing Jenkins tests to fail for PRs against branch 0.9. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #178 from tdas/branch-0.9-fix and squashes the following commits: a633bce [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-0.9' into branch-0.9-fix 9b043cc [Tathagata Das] Made dev/run-tests executable.	19 March 2014, 23:10:45 UTC
72875b2	Thomas Graves	19 March 2014, 17:41:11 UTC	Update the yarn alpha version to 0.9.1-incubating-SNAPSHOT	19 March 2014, 17:41:11 UTC
250ec27	Thomas Graves	19 March 2014, 13:09:20 UTC	SPARK-1203 fix saving to hdfs from yarn Author: Thomas Graves <tgraves@apache.org> Closes #173 from tgravescs/SPARK-1203 and squashes the following commits: 4fd5ded [Thomas Graves] adding import 964e3f7 [Thomas Graves] SPARK-1203 fix saving to hdfs from yarn	19 March 2014, 13:19:47 UTC
d385b5a	shiyun.wxm	19 March 2014, 08:42:34 UTC	bugfix: Wrong "Duration" in "Active Stages" in stages page If a stage which has completed once loss parts of data, it will be resubmitted. At this time, it appears that stage.completionTime > stage.submissionTime. Author: shiyun.wxm <shiyun.wxm@taobao.com> Closes #170 from BlackNiuza/duration_problem and squashes the following commits: a86d261 [shiyun.wxm] tow space indent c0d7b24 [shiyun.wxm] change the style 3b072e1 [shiyun.wxm] fix scala style f20701e [shiyun.wxm] bugfix: "Duration" in "Active Stages" in stages page (cherry picked from commit d55ec86de2e96f7dc9d1dd107daa35c3823791ec) Signed-off-by: Reynold Xin <rxin@apache.org>	19 March 2014, 08:42:42 UTC
7ec78bc	Tathagata Das	19 March 2014, 05:09:16 UTC	[SPARK-1274] Add dev scripts to merge PRs and create releases from master to branch-0.9 All the files are one-to-one copied from master. Only the Spark version numbers were changed. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #176 from tdas/branch-0.9 and squashes the following commits: fb1b913 [Tathagata Das] Changed version numbers for 0.9.1 release. 3303d5e [Tathagata Das] Copied Apache release scripts in master to branch-0.9	19 March 2014, 05:09:16 UTC
0183ddd	Nick Lanham	19 March 2014, 05:04:57 UTC	Bundle tachyon: SPARK-1269 This should all work as expected with the current version of the tachyon tarball (0.4.1) Author: Nick Lanham <nick@afternight.org> Closes #137 from nicklan/bundle-tachyon and squashes the following commits: 2eee15b [Nick Lanham] Put back in exec, start tachyon first 738ba23 [Nick Lanham] Move tachyon out of sbin f2f9bc6 [Nick Lanham] More checks for tachyon script 111e8e1 [Nick Lanham] Only try tachyon operations if tachyon script exists 0561574 [Nick Lanham] Copy over web resources so web interface can run 4dc9809 [Nick Lanham] Update to tachyon 0.4.1 0a1a20c [Nick Lanham] Add scripts using tachyon tarball (cherry picked from commit a18ea00f3af0fa4c6b2c59933e22b6c9f0f636c8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	19 March 2014, 05:05:18 UTC
20d9458	Diana Carroll	18 March 2014, 00:35:51 UTC	[Spark-1261] add instructions for running python examples to doc overview page Author: Diana Carroll <dcarroll@cloudera.com> Closes #162 from dianacarroll/SPARK-1261 and squashes the following commits: 14ac602 [Diana Carroll] typo in python example text 5121e3e [Diana Carroll] Add explanation of how to run Python examples to main doc overview page	18 March 2014, 00:37:03 UTC
4562140	Patrick Wendell	17 March 2014, 21:03:32 UTC	SPARK-1244: Throw exception if map output status exceeds frame size This is a very small change on top of @andrewor14's patch in #147. Author: Patrick Wendell <pwendell@gmail.com> Author: Andrew Or <andrewor14@gmail.com> Closes #152 from pwendell/akka-frame and squashes the following commits: e5fb3ff [Patrick Wendell] Reversing test order 393af4c [Patrick Wendell] Small improvement suggested by Andrew Or 8045103 [Patrick Wendell] Breaking out into two tests 2b4e085 [Patrick Wendell] Consolidate Executor use of akka frame size c9b6109 [Andrew Or] Simplify test + make access to akka frame size more modular 281d7c9 [Andrew Or] Throw exception on spark.akka.frameSize exceeded + Unit tests (cherry picked from commit 796977acdb5c96ca5c08591657137fb3e44d2e94) Conflicts: core/src/test/scala/org/apache/spark/AkkaUtilsSuite.scala core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala	17 March 2014, 21:06:28 UTC
af7e8b1	CodingCat	17 March 2014, 05:14:59 UTC	SPARK-1240: handle the case of empty RDD when takeSample https://spark-project.atlassian.net/browse/SPARK-1240 It seems that the current implementation does not handle the empty RDD case when run takeSample In this patch, before calling sample() inside takeSample API, I add a checker for this case and returns an empty Array when it's a empty RDD; also in sample(), I add a checker for the invalid fraction value In the test case, I also add several lines for this case Author: CodingCat <zhunansjtu@gmail.com> Closes #135 from CodingCat/SPARK-1240 and squashes the following commits: fef57d4 [CodingCat] fix the same problem in PySpark 36db06b [CodingCat] create new test cases for takeSample from an empty red 810948d [CodingCat] further fix a40e8fb [CodingCat] replace if with require ad483fd [CodingCat] handle the case with empty RDD when take sample Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala	17 March 2014, 05:40:22 UTC
1dc1e98	Prabin Banka	10 March 2014, 20:27:00 UTC	SPARK-977 Added Python RDD.zip function was raised earlier as a part of apache/incubator-spark#486 Author: Prabin Banka <prabin.banka@imaginea.com> Closes #76 from prabinb/python-api-zip and squashes the following commits: b1a31a0 [Prabin Banka] Added Python RDD.zip function Conflicts: python/pyspark/rdd.py	17 March 2014, 05:16:17 UTC
249930a	prabinb	12 March 2014, 06:57:05 UTC	Spark-1163, Added missing Python RDD functions Author: prabinb <prabin.banka@imaginea.com> Closes #92 from prabinb/python-api-rdd and squashes the following commits: 51129ca [prabinb] Added missing Python RDD functions Added __repr__ function to StorageLevel class. Added doctest for RDD.getStorageLevel(). Conflicts: python/pyspark/rdd.py	17 March 2014, 05:14:53 UTC
4480505	Prashant Sharma	10 March 2014, 20:37:11 UTC	SPARK-1168, Added foldByKey to pyspark. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #115 from ScrapCodes/SPARK-1168/pyspark-foldByKey and squashes the following commits: db6f67e [Prashant Sharma] SPARK-1168, Added foldByKey to pyspark.	17 March 2014, 05:13:33 UTC
e74e79a	Jyotiska NK	27 February 2014, 05:37:04 UTC	Updated link for pyspark examples in docs Author: Jyotiska NK <jyotiska123@gmail.com> Closes #22 from jyotiska/pyspark_docs and squashes the following commits: 426136c [Jyotiska NK] Updated link for pyspark examples	17 March 2014, 05:12:51 UTC
ef74e44	Patrick Wendell	13 March 2014, 06:16:59 UTC	SPARK-1019: pyspark RDD take() throws an NPE Author: Patrick Wendell <pwendell@gmail.com> Closes #112 from pwendell/pyspark-take and squashes the following commits: daae80e [Patrick Wendell] SPARK-1019: pyspark RDD take() throws an NPE (cherry picked from commit 4ea23db0efff2f39ac5b8f0bd1d9a6ffa3eceb0d) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	13 March 2014, 06:17:17 UTC
87e4dd5	jianghan	13 March 2014, 02:46:12 UTC	Fix example bug: compile error Author: jianghan <jianghan@xiaomi.com> Closes #132 from pooorman/master and squashes the following commits: 54afbe0 [jianghan] Fix example bug: compile error (cherry picked from commit 31a704004f9b4ad34f92ae5c95ae6e90d0ab62c7) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	13 March 2014, 02:46:48 UTC
51a77e9	Prashant Sharma	12 March 2014, 22:57:44 UTC	SPARK-1162 Added top in python. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #93 from ScrapCodes/SPARK-1162/pyspark-top-takeOrdered and squashes the following commits: ece1fa4 [Prashant Sharma] Added top in python. (cherry picked from commit b8afe3052086547879ebf28d6e36207e0d370710) Signed-off-by: Matei Zaharia <matei@databricks.com>	12 March 2014, 22:57:54 UTC
7049164	Patrick Wendell	11 March 2014, 21:48:01 UTC	Version fix in pom file	11 March 2014, 21:48:01 UTC
6cbd580	Patrick Wendell	11 March 2014, 18:53:29 UTC	Log4j build fix on 0.9 branch Spark should include log4j by default. Downstream packagers can exclude log4j if they want to use another logging backend.	11 March 2014, 18:53:29 UTC
0c91927	Patrick Wendell	11 March 2014, 18:16:59 UTC	SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues... This patch removes Ganglia integration from the default build. It allows users willing to link against LGPL code to use Ganglia by adding build flags or linking against a new Spark artifact called spark-ganglia-lgpl. This brings Spark in line with the Apache policy on LGPL code enumerated here: https://www.apache.org/legal/3party.html#options-optional Author: Patrick Wendell <pwendell@gmail.com> Closes #108 from pwendell/ganglia and squashes the following commits: 326712a [Patrick Wendell] Responding to review feedback 5f28ee4 [Patrick Wendell] SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues. (cherry picked from commit 16788a654246067fd966033b5dc9bc0d4c759b70) Conflicts: core/pom.xml dev/audit-release/sbt_app_core/src/main/scala/SparkApp.scala dev/create-release/create-release.sh pom.xml project/SparkBuild.scala	11 March 2014, 18:24:21 UTC
6f0db0a	Bryn Keller	25 February 2014, 01:35:22 UTC	For outputformats that are Configurable, call setConf before sending data to them. [SPARK-1108] This allows us to use, e.g. HBase's TableOutputFormat with PairRDDFunctions.saveAsNewAPIHadoopFile, which otherwise would throw NullPointerException because the output table name hasn't been configured. Note this bug also affects branch-0.9 Author: Bryn Keller <bryn.keller@intel.com> Closes #638 from xoltar/SPARK-1108 and squashes the following commits: 7e94e7d [Bryn Keller] Import, comment, and format cleanup per code review 7cbcaa1 [Bryn Keller] For outputformats that are Configurable, call setConf before sending data to them. This allows us to use, e.g. HBase TableOutputFormat, which otherwise would throw NullPointerException because the output table name hasn't been configured (cherry picked from commit 4d880304867b55a4f2138617b30600b7fa013b14) Conflicts: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala	10 March 2014, 00:47:46 UTC
0f0d044	Patrick Wendell	09 March 2014, 00:02:42 UTC	SPARK-1190: Do not initialize log4j if slf4j log4j backend is not being used Author: Patrick Wendell <pwendell@gmail.com> Closes #107 from pwendell/logging and squashes the following commits: be21c11 [Patrick Wendell] Logging fix (cherry picked from commit e59a3b6c415b95e8137f5a154716b12653a8aed0) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	09 March 2014, 00:02:56 UTC
0fc0fdb	Mark Grover	06 March 2014, 00:52:58 UTC	SPARK-1184: Update the distribution tar.gz to include spark-assembly jar See JIRA for details. Author: Mark Grover <mark@apache.org> Closes #78 from markgrover/SPARK-1184 and squashes the following commits: 12b78e6 [Mark Grover] SPARK-1184: Update the distribution tar.gz to include spark-assembly jar (cherry picked from commit cda381f88cc03340fdf7b2d681699babbae2a56e) Conflicts: assembly/pom.xml	06 March 2014, 00:55:36 UTC
7ea89ec	Prashant Sharma	04 March 2014, 23:32:43 UTC	SPARK-1109 wrong API docs for pyspark map function Author: Prashant Sharma <prashant.s@imaginea.com> Closes #73 from ScrapCodes/SPARK-1109/wrong-API-docs and squashes the following commits: 1a55b58 [Prashant Sharma] SPARK-1109 wrong API docs for pyspark map function (cherry picked from commit 02836657cfec50bc6cc357541e40f8d36c90b352) Signed-off-by: Matei Zaharia <matei@databricks.com>	04 March 2014, 23:33:00 UTC
267d96c	Patrick Wendell	03 March 2014, 02:18:44 UTC	Add Jekyll tag to isolate "production-only" doc components. (0.9 version) Author: Patrick Wendell <pwendell@gmail.com> Closes #57 from pwendell/jekyll-prod-0.9 and squashes the following commits: 69a7614 [Patrick Wendell] Add Jekyll tag to isolate "production-only" doc components.	03 March 2014, 02:18:44 UTC
f2bf44a	Reynold Xin	28 February 2014, 05:13:22 UTC	Removed reference to incubation in Spark user docs. Author: Reynold Xin <rxin@apache.org> Closes #2 from rxin/docs and squashes the following commits: 08bbd5f [Reynold Xin] Removed reference to incubation in Spark user docs. (cherry picked from commit 40e080a68a8fd025435e9ff84fa9280b4aba4dcf) Conflicts: docs/_config.yml	28 February 2014, 05:14:18 UTC
bc5e7d7	CodingCat	27 February 2014, 07:42:15 UTC	[SPARK-1089] fix the regression problem on ADD_JARS in 0.9 https://spark-project.atlassian.net/browse/SPARK-1089 copied from JIRA, reported by @ash211 "Using the ADD_JARS environment variable with spark-shell used to add the jar to both the shell and the various workers. Now it only adds to the workers and importing a custom class in the shell is broken. The workaround is to add custom jars to both ADD_JARS and SPARK_CLASSPATH. We should fix ADD_JARS so it works properly again. See various threads on the user list: https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201402.mbox/%3CCAJbo4neMLiTrnm1XbyqomWmp0m+EUcg4yE-txuRGSVKOb5KLeA@mail.gmail.com%3E (another one that doesn't appear in the archives yet titled "ADD_JARS not working on 0.9")" The reason of this bug is two-folds in the current implementation of SparkILoop.scala, the settings.classpath is not set properly when the process() method is invoked the weird behaviour of Scala 2.10, (I personally thought it is a bug) if we simply set value of a PathSettings object (like settings.classpath), the isDefault is not set to true (this is a flag showing if the variable is modified), so it makes the PathResolver loads the default CLASSPATH environment variable value to calculated the path (see https://github.com/scala/scala/blob/2.10.x/src/compiler/scala/tools/util/PathResolver.scala#L215) what we have to do is to manually make this flag set, (https://github.com/CodingCat/incubator-spark/blob/e3991d97ddc33e77645e4559b13bf78b9e68239a/repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L884) Author: CodingCat <zhunansjtu@gmail.com> Closes #13 from CodingCat/SPARK-1089 and squashes the following commits: 8af81e7 [CodingCat] impose non-null settings 9aa2125 [CodingCat] code cleaning ce36676 [CodingCat] code cleaning e045582 [CodingCat] fix the regression problem on ADD_JARS in 0.9 (cherry picked from commit 345df5f4a9c16a6a87440afa2b09082fc3d224bd) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	27 February 2014, 07:42:53 UTC
349764d	Reynold Xin	27 February 2014, 00:52:26 UTC	Removed reference to incubation in README.md. Author: Reynold Xin <rxin@apache.org> Closes #1 from rxin/readme and squashes the following commits: b3a77cd [Reynold Xin] Removed reference to incubation in README.md. (cherry picked from commit 84f7ca138165ca413897dada35c602676b0a614f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>	27 February 2014, 00:53:56 UTC
886a466	Bouke van der Bijl	26 February 2014, 22:50:37 UTC	SPARK-1115: Catch depickling errors This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block (cherry picked from commit 12738c1aec136acd7f2e3e2f8f2b541db0890630) Signed-off-by: Josh Rosen <joshrosen@apache.org>	26 February 2014, 22:53:30 UTC
6fe72dd	Matei Zaharia	26 February 2014, 19:20:16 UTC	SPARK-1135: fix broken anchors in docs A recent PR that added Java vs Scala tabs for streaming also inadvertently added some bad code to a document.ready handler, breaking our other handler that manages scrolling to anchors correctly with the floating top bar. As a result the section title ended up always being hidden below the top bar. This removes the unnecessary JavaScript code. Author: Matei Zaharia <matei@databricks.com> Closes #3 from mateiz/doc-links and squashes the following commits: e2a3488 [Matei Zaharia] SPARK-1135: fix broken anchors in docs	26 February 2014, 19:56:12 UTC
0661cdc	Matei Zaharia	24 February 2014, 21:14:56 UTC	Fix removal from shuffleToMapStage to search for a key-value pair with our stage instead of using our shuffleID.	25 February 2014, 01:01:21 UTC
5e74b8e	Matei Zaharia	24 February 2014, 07:45:48 UTC	SPARK-1124: Fix infinite retries of reduce stage when a map stage failed In the previous code, if you had a failing map stage and then tried to run reduce stages on it repeatedly, the first reduce stage would fail correctly, but the later ones would mistakenly believe that all map outputs are available and start failing infinitely with fetch failures from "null".	25 February 2014, 01:00:47 UTC

Newer
Older