https://github.com/apache/spark

sort by:
Revision Author Date Message Commit Date
62b3158 Merge pull request #583 from colorant/zookeeper. Minor fix for ZooKeeperPersistenceEngine to use configured working dir Author: Raymond Liu <raymond.liu@intel.com> Closes #583 and squashes the following commits: 91b0609 [Raymond Liu] Minor fix for ZooKeeperPersistenceEngine to use configured working dir (cherry picked from commit 68b2c0d02dbdca246ca686b871c06af53845d5b5) Signed-off-by: Aaron Davidson <aaron@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/deploy/master/ZooKeeperPersistenceEngine.scala 12 February 2014, 06:48:48 UTC
c89b71a Merge pull request #453 from shivaram/branch-0.8-SparkR Backport changes used in SparkR to 0.8 branch Backports two changes from master branch 1. Adding collectPartition to JavaRDD and using it from Python as well 2. Making broadcast id public. 23 January 2014, 08:18:56 UTC
38bf786 Restore takePartition to PythonRDD, context.py This is to avoid removing functions in minor releases. 23 January 2014, 07:28:54 UTC
f3cc3a7 Merge pull request #496 from pwendell/master Fix bug in worker clean-up in UI Introduced in d5a96fec (/cc @aarondav). This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers. (cherry picked from commit a1cd185122602c96fb8ae16c0b506702283bf6e2) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 23 January 2014, 03:38:02 UTC
691dfef Make broadcast id public for use in R frontend 17 January 2014, 05:28:56 UTC
3ef68e4 Add comment explaining collectPartitions's use 17 January 2014, 03:21:18 UTC
91e6e5b Make collectPartitions take an array of partitions Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface. 17 January 2014, 03:21:08 UTC
5092bae Add collectPartition to JavaRDD interface. Also remove takePartition from PythonRDD and use collectPartition in rdd.py. Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala python/pyspark/context.py python/pyspark/rdd.py 17 January 2014, 03:20:57 UTC
5c443ad Merge pull request #320 from kayousterhout/erroneous_failed_msg Remove erroneous FAILED state for killed tasks. Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem. I'm not at all sure that this is the best way to fix this problem, so alternate suggestions welcome. @rxin guessing you're the right person to look at this. (cherry picked from commit 0475ca8f81b6b8f21fdb841922cd9ab51cfc8cc3) Signed-off-by: Reynold Xin <rxin@apache.org> 02 January 2014, 23:17:55 UTC
88c565d Merge pull request #281 from kayousterhout/local_indirect_fix Handle IndirectTaskResults in LocalScheduler This fixes a bug where large results aren't correctly handled when running in local mode. Not doing this in master because expecting the Local/Cluster scheduler consolidation to go into 0.9, which will fix this issue (see #127) 21 December 2013, 03:15:54 UTC
d7bf08c Fixed test failure by adding exception to abortion msg 20 December 2013, 18:19:03 UTC
6183102 Handle IndirectTaskResults in LocalScheduler 20 December 2013, 08:43:20 UTC
df5fada Merge pull request #273 from rxin/top Fixed a performance problem in RDD.top and BoundedPriorityQueue BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion. This should also cherry pick cleanly into branch-0.8. (cherry picked from commit f4effb375e93993be1777ebb423c100ea8422f24) Signed-off-by: Reynold Xin <rxin@apache.org> 18 December 2013, 06:27:48 UTC
f898238 Merge pull request #271 from ewencp/really-force-ssh-pseudo-tty-0.8 Force pseudo-tty allocation in spark-ec2 script. ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits. Without this change, if you run the script this way (e.g. using nohup from a cron job), it will fail setting up the nodes because some of the ssh commands complain about missing ttys and then fail. (This version is for the 0.8 branch. I've filed a separate request for master since changes to the script caused the patches to be different.) 16 December 2013, 23:23:25 UTC
2e2ead4 Force pseudo-tty allocation in spark-ec2 script. ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits. 16 December 2013, 16:16:56 UTC
8f56390 Version updates not handled by maven release plug-in 12 December 2013, 18:49:35 UTC
8ce9bd8 [maven-release-plugin] prepare for next development iteration 10 December 2013, 22:36:04 UTC
b87d31d [maven-release-plugin] prepare release v0.8.1-incubating 10 December 2013, 22:35:56 UTC
628ca85 Small bug fix in YARN build patch 10 December 2013, 22:24:26 UTC
9415d2d Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit 909a9e4d11eccea03980a8ed7ba7f9f27c68e33a. 10 December 2013, 22:22:34 UTC
d101dfe Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 7ca53692c821ead3a50520fe51e229d9fc1f856d. 10 December 2013, 22:22:31 UTC
7ca5369 [maven-release-plugin] prepare for next development iteration 10 December 2013, 21:22:17 UTC
909a9e4 [maven-release-plugin] prepare release v0.8.1-incubating 10 December 2013, 21:22:11 UTC
b0e50f8 Merge pull request #250 from pwendell/master README incorrectly suggests build sources spark-env.sh This is misleading because the build doesn't source that file. IMO it's better to force people to specify build environment variables on the command line always, like we do in every example, so I'm just removing this doc. (cherry picked from commit d2efe13574090e93c600adeacc7f6356bc196e6c) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 10 December 2013, 21:06:32 UTC
c45a267 Small fix from prior commit 10 December 2013, 08:08:02 UTC
22fcb78 Add missing dependencies 10 December 2013, 08:05:44 UTC
8129328 Updating CHANGES and one fix from last merge 10 December 2013, 07:13:15 UTC
d03589d Merge pull request #248 from colorant/branch-0.8 Fix POM file for mvn assembly on hadoop 2.2 Yarn This is the fix for maven YARN build on hadoop 2.2 10 December 2013, 07:10:00 UTC
d614945 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit 7e5564cc788c33a3048914d61e90eff88b1a3903. 10 December 2013, 07:08:44 UTC
52c0890 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 216b473df0d6912f4905204402cfe04568e0560c. 10 December 2013, 07:08:33 UTC
e468f81 Fix POM file for mvn assembly on hadoop 2.2 Yarn 10 December 2013, 05:30:57 UTC
216b473 [maven-release-plugin] prepare for next development iteration 10 December 2013, 02:07:00 UTC
7e5564c [maven-release-plugin] prepare release v0.8.1-incubating 10 December 2013, 02:06:54 UTC
4567295 Merge pull request #246 from pwendell/master Add missing license headers I found this when doing further audits on the 0.8.1 release candidate. (cherry picked from commit 6169fe14a140146602fb07cfcd13eee6efad98f9) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 10 December 2013, 00:59:44 UTC
b6754ab Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit c88a9916a183e7a57b53537531620bbde6d8869a. 10 December 2013, 00:59:44 UTC
1e8b044 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit af7498870a4eed0f6e4b9fe37cc69edc022d0d8c. 10 December 2013, 00:59:44 UTC
af74988 [maven-release-plugin] prepare for next development iteration 09 December 2013, 05:46:45 UTC
c88a991 [maven-release-plugin] prepare release v0.8.1-incubating 09 December 2013, 05:46:33 UTC
5ab8e04 Updating CHANGES file 09 December 2013, 05:36:53 UTC
4b2769b Merge pull request #195 from dhardy92/fix_DebScriptPackage [Deb] fix package of Spark classes adding org.apache prefix in scripts embeded in .deb (cherry picked from commit d992ec6d9be30e624c8edb2a50c193ac3cfbab7a) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 09 December 2013, 05:36:52 UTC
fde5347 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit 00d5c734dd4b12e128295518e4bd620fdb13bed7. 09 December 2013, 05:36:30 UTC
7a72b60 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 4c555328dd782efc2ab97ae35ea2f3a5b00cb450. 09 December 2013, 05:36:27 UTC
4c55532 [maven-release-plugin] prepare for next development iteration 09 December 2013, 05:25:06 UTC
00d5c73 [maven-release-plugin] prepare release v0.8.1-incubating 09 December 2013, 05:24:49 UTC
71d76a0 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit bf23794a766d4f94076d2417f128f15465f25495. 09 December 2013, 04:50:52 UTC
dcc678f Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 4ece27df4243a7b0ba2791c3c7bece5aed86d342. 09 December 2013, 04:50:49 UTC
4ece27d [maven-release-plugin] prepare for next development iteration 08 December 2013, 19:37:12 UTC
bf23794 [maven-release-plugin] prepare release v0.8.1-incubating 08 December 2013, 19:37:06 UTC
1bc7259 Minor documentation fixes 08 December 2013, 19:12:03 UTC
c7058d1 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit e88e6369d9ac55dff75c230ed5bc96c995b1d620. 08 December 2013, 19:11:49 UTC
408f50b Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 7f7ac64e2847b7cbdb9274fae75fed981601e7d7. 08 December 2013, 19:11:45 UTC
7f7ac64 [maven-release-plugin] prepare for next development iteration 08 December 2013, 10:35:36 UTC
e88e636 [maven-release-plugin] prepare release v0.8.1-incubating 08 December 2013, 10:35:29 UTC
51cd2f0 Merge pull request #243 from pwendell/branch-0.8 Improve CHANGES.txt file in branch 0.8 This makes the format consistent with the 0.8.0 release which was nicer. 08 December 2013, 06:53:53 UTC
871ab60 Use consistent CHANGES.txt format 08 December 2013, 06:51:35 UTC
c14f373 Merge pull request #241 from pwendell/master Update broken links and add HDP 2.0 version string I ran a link checker on the UI and found several broken links. (cherry picked from commit 1f4a4bccf3cf7376c634bad2ebadfdd4c6f78195) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 08 December 2013, 06:35:31 UTC
473cba2 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit fba873857133fb87cd53dc4cb0501eea1bd7edbf. 08 December 2013, 05:41:28 UTC
c761914 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 91505f3f2ace8b26e4dae90d362563bf2faa8fbf. 08 December 2013, 05:41:24 UTC
91505f3 [maven-release-plugin] prepare for next development iteration 07 December 2013, 21:05:11 UTC
fba8738 [maven-release-plugin] prepare release v0.8.1-incubating 07 December 2013, 21:05:05 UTC
1d3fa31 Revert "[maven-release-plugin] prepare release v0.8.1-incubating" This reverts commit 0f059bd62d1c840713fac0d9c6ee6d9165682c72. 07 December 2013, 20:54:05 UTC
a669605 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit d0b9fce12d586c464306072fb210cb44b97dddd3. 07 December 2013, 20:53:58 UTC
9c9e71e Merge pull request #241 from pwendell/branch-0.8 Fix race condition in JobLoggerSuite [0.8 branch] I found this when running the tests locally. It's similar to a race condition found when making the 0.8.0 release. 07 December 2013, 20:47:26 UTC
295734f Fix race condition in JobLoggerSuite 07 December 2013, 20:40:18 UTC
d0b9fce [maven-release-plugin] prepare for next development iteration 07 December 2013, 20:31:33 UTC
0f059bd [maven-release-plugin] prepare release v0.8.1-incubating 07 December 2013, 20:31:25 UTC
30bcd84 Clean-up of changes file 07 December 2013, 20:06:08 UTC
92597c0 Merge pull request #240 from pwendell/master SPARK-917 Improve API links in nav bar (cherry picked from commit 6494d62fe40ac408b14de3f0f3de8ec896a0ae6e) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 19:58:00 UTC
d6e5eab typo fix 07 December 2013, 09:15:20 UTC
cfca70e Merge pull request #236 from pwendell/shuffle-docs Adding disclaimer for shuffle file consolidation (cherry picked from commit 1b38f5f2774982d524742e987b6cef26ccaae676) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 09:15:20 UTC
80cc4ff Merge pull request #237 from pwendell/formatting-fix Formatting fix This is a single-line change. The diff appears larger here due to github being out of sync. (cherry picked from commit 10c3c0c6524d0cf6c59b6f2227bf316cdeb7d06c) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 09:15:20 UTC
4a6aae3 Merge pull request #235 from pwendell/master Minor doc fixes and updating README (cherry picked from commit e5d5728b72e58046cc175ab06b5f1c7be4957711) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 09:15:20 UTC
2642312 Merge pull request #234 from alig/master Updated documentation about the YARN v2.2 build process (cherry picked from commit 241336add5be07fca5ff6c17eed368df7d0c3e3c) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 09:15:20 UTC
07470d1 Small fix for Harvey's patch 07 December 2013, 09:15:20 UTC
2d3eae2 Merge pull request #199 from harveyfeng/yarn-2.2 Hadoop 2.2 migration Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches. Short description for each set of commits: a98f5a0 - "Misc style changes in the 'yarn' package" a67ebf4 - "A few more style fixes in the 'yarn' package" Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code. ab8652f - "Add a 'new-yarn' directory ... " Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here. 4f1c3fa - "Hadoop 2.2 YARN API migration ..." API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too. Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141. a1a1c62 - "Add optional Hadoop 2.2 settings in sbt build ... " If Spark should be built against Hadoop 2.2, then: a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory. b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing. There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though... (cherry picked from commit 72b696156c8662cae2cef4b943520b4be86148ea) Conflicts: project/SparkBuild.scala streaming/pom.xml 07 December 2013, 09:15:19 UTC
1e9d084 Merge pull request #101 from colorant/yarn-client-scheduler For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode. (cherry picked from commit eb4296c8f7561aaf8782479dd5cd7c9320b7fa6b) Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala 07 December 2013, 09:15:19 UTC
20d1f8b Merge pull request #191 from hsaputra/removesemicolonscala Cleanup to remove semicolons (;) from Scala code -) The main reason for this PR is to remove semicolons from single statements of Scala code. -) Remove unused imports as I see them -) Fix ASF comment header from some of files (bad copy paste I suppose) (cherry picked from commit 4b895013cc965b37d44fd255656da470a3d2c222) Conflicts: examples/src/main/scala/org/apache/spark/streaming/examples/MQTTWordCount.scala Squash into 191 07 December 2013, 09:15:09 UTC
2b76315 Merge pull request #178 from hsaputra/simplecleanupcode Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect. (cherry picked from commit 1b5b358309a5adfc12b75b0ebb4254ad8e69f5a0) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 08:02:14 UTC
ee22be0 Merge pull request #189 from tgravescs/sparkYarnErrorHandling Impove Spark on Yarn Error handling Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it. I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn. (cherry picked from commit aa638ed9c140174a47df082ed5631ffe8e624ee6) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 07 December 2013, 07:29:38 UTC
d77c337 Merge pull request #232 from markhamstra/FiniteWait jobWaiter.synchronized before jobWaiter.wait ...else ``IllegalMonitorStateException`` in ``SimpleFutureAction#ready``. (cherry picked from commit 078049877e123fe7e4c4553e36055de572cab7c4) Signed-off-by: Reynold Xin <rxin@apache.org> 06 December 2013, 07:30:11 UTC
17ca8a1 Merge pull request #231 from pwendell/branch-0.8 Bumping version numbers for 0.8.1 release This bumps various version numbers for the release. Note that we don't bump any of the pom.xml files because they get automatically updated as part of the maven release plug-ins. 05 December 2013, 22:32:01 UTC
d80a489 Bumping version numbers for 0.8.1 release 05 December 2013, 21:10:19 UTC
47fce43 Merge pull request #228 from pwendell/master Document missing configs and set shuffle consolidation to false. (cherry picked from commit 5d460253d6080d871cb71efb112ea17be0873771) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 05 December 2013, 20:33:02 UTC
27212ad Revert "[maven-release-plugin] prepare release spark-parent-0.8.1-incubating" This reverts commit 15c356c362529347ea87b95e7a6008e0391faceb. 05 December 2013, 01:26:36 UTC
a35e186 Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit 2dd1e8f8613b7fab409ed8c72b70c48539d54904. 05 December 2013, 01:26:32 UTC
2dd1e8f [maven-release-plugin] prepare for next development iteration 05 December 2013, 00:42:46 UTC
15c356c [maven-release-plugin] prepare release spark-parent-0.8.1-incubating 05 December 2013, 00:41:55 UTC
03edfa5 Change log for release 0.8.1-incubating 05 December 2013, 00:28:06 UTC
cc33f9f Merge pull request #227 from pwendell/master Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes. (cherry picked from commit 182f9baeed8e4cc62ca14ae04413394477a7ccfb) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 04 December 2013, 23:57:47 UTC
ba44f21 Merge pull request #223 from rxin/transient Mark partitioner, name, and generator field in RDD as @transient. As part of the effort to reduce serialized task size. (cherry picked from commit d6e5473872f405a6f4e466705e33cf893af915c1) Signed-off-by: Patrick Wendell <pwendell@gmail.com> 04 December 2013, 23:56:58 UTC
31da065 Merge pull request #95 from aarondav/perftest Minor: Put StoragePerfTester in org/apache/ (cherry picked from commit a51359c917a9ebe379b32ebc53fd093c454ea195) Signed-off-by: Reynold Xin <rxin@apache.org> 04 December 2013, 22:01:13 UTC
daaaee1 Merge pull request #218 from JoshRosen/spark-970-pyspark-unicode-error Fix UnicodeEncodeError in PySpark saveAsTextFile() (SPARK-970) This fixes [SPARK-970](https://spark-project.atlassian.net/browse/SPARK-970), an issue where PySpark's saveAsTextFile() could throw UnicodeEncodeError when called on an RDD of Unicode strings. Please merge this into master and branch-0.8. (cherry picked from commit 8a3475aed66617772f4e98e9f774b109756eb391) Signed-off-by: Reynold Xin <rxin@apache.org> 03 December 2013, 22:22:05 UTC
8b091fe Merge pull request #181 from BlackNiuza/fix_tasks_number correct number of tasks in ExecutorsUI Index `a` is not `execId` here (cherry picked from commit f568912f85f58ae152db90f199c1f3a002f270c1) Signed-off-by: Reynold Xin <rxin@apache.org> 03 December 2013, 05:28:13 UTC
d21266e Merge pull request #219 from sundeepn/schedulerexception Scheduler quits when newStage fails The current scheduler thread does not handle exceptions from newStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler. (cherry picked from commit 740922f25d5f81617fbe02c7bcd1610d6426bbef) Signed-off-by: Reynold Xin <rxin@apache.org> 01 December 2013, 20:47:30 UTC
be9c176 Merge pull request #201 from rxin/mappartitions Use the proper partition index in mapPartitionsWIthIndex mapPartitionsWithIndex uses TaskContext.partitionId as the partition index. TaskContext.partitionId used to be identical to the partition index in a RDD. However, pull request #186 introduced a scenario (with partition pruning) that the two can be different. This pull request uses the right partition index in all mapPartitionsWithIndex related calls. Also removed the extra MapPartitionsWIthContextRDD and put all the mapPartitions related functionality in MapPartitionsRDD. (cherry picked from commit 14bb465bb3d65f5b1034ada85cfcad7460034073) Signed-off-by: Reynold Xin <rxin@apache.org> 26 November 2013, 18:27:41 UTC
9949561 Merge pull request #197 from aarondav/patrick-fix Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed. (cherry picked from commit 972171b9d93b07e8511a2da3a33f897ba033484b) Signed-off-by: Reynold Xin <rxin@apache.org> 24 November 2013, 23:51:23 UTC
c59ce18 Merge pull request #200 from mateiz/hash-fix AppendOnlyMap fixes - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`) - Some other small optimizations throughout (see commit comments) (cherry picked from commit 718cc803f7e0600c9ab265022eb6027926a38010) Signed-off-by: Reynold Xin <rxin@apache.org> 24 November 2013, 03:04:00 UTC
d7ab87e Merge pull request #193 from aoiwelle/patch-1 Fix Kryo Serializer buffer documentation inconsistency The documentation here is inconsistent with the coded default and other documentation. (cherry picked from commit 086b097e33a2ce622ec6352819bccc92106f43b7) Signed-off-by: Reynold Xin <rxin@apache.org> 22 November 2013, 02:27:16 UTC
d7c6a00 Merge pull request #196 from pwendell/master TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open files when running shuffles. (cherry picked from commit f20093c3afa68439b1c9010de189d497df787c2a) Signed-off-by: Reynold Xin <rxin@apache.org> 22 November 2013, 02:13:37 UTC
f678e10 Merge branch 'master' of github.com:tbfenet/incubator-spark PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD. (cherry picked from commit 2fead510f74b962b293de4d724136c24a9825271) Signed-off-by: Reynold Xin <rxin@apache.org> 20 November 2013, 23:17:28 UTC
back to top