https://github.com/mongodb/mongo-hadoop
Raw File
Tip revision: 20208a027ad8638e56dfcf040773f176d6ee059f authored by Alexander Golin on 28 January 2022, 19:28:02 UTC
Merge pull request #160 from mongodb/DRIVERS-2036
Tip revision: 20208a0
History.md
2.0.2 / 27th of January, 2017
=============================

This is a patch-level release that fixes two issues:

* Allow "skip" to be set on MongoInputSplit (HADOOP-304)
* Correctly handle renaming nested fields in Hive (HADOOP-303)

Thanks to mkrstic for the patch for HADOOP-304!

For complete details on the issues resolved in 2.0.2, consult the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/17932

2.0.1 / 30th of August, 2016
============================

This is a patch-level release that adds the noTimeout option to the cursor
used by MongoPaginatingSplitter. More details of the issue can be found
on the ticket's Jira page: https://jira.mongodb.org/browse/HADOOP-295

2.0.0 / 15th of August, 2016
============================

This is a major release touting several new features. As a major release, it
removes several deprecated methods and objects, breaking API in ways that should
not affect most users of Hadoop tools.

Some of the major new features introduced in this version include:

* Ability to collocate Hadoop nodes and MongoDB shards for data locality (HADOOP-202)
* Add GridFSInputFormat (HADOOP-272)
* Add MongoSampleSplitter (HADOOP-283)
* Support document replacement (HADOOP-263)
* Add back support for Hadoop 1.2.x (HADOOP-246)

For complete details on the issues resolved in 2.0.0, consult the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/15622

This version is identical to 2.0.0-rc0.

2.0.0-rc0 / 26th of June, 2016
==============================

This is a major release touting several new features. As a major release, it
removes several deprecated methods and objects, breaking API in ways that should
not affect most users of Hadoop tools.

Some of the major new features introduced in this version include:

* Ability to collocate Hadoop nodes and MongoDB shards for data locality (HADOOP-202)
* Add GridFSInputFormat (HADOOP-272)
* Add MongoSampleSplitter (HADOOP-283)
* Support document replacement (HADOOP-263)
* Add back support for Hadoop 1.2.x (HADOOP-246)

For complete details on the issues resolved in 2.0.0, consult the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/15622

This release candidate became the stable 2.0.0 version.

1.5.2 / 28th of March, 2016
===========================

This release fixes a couple issues when using the "pymongo-spark" library,
including a bug where datetimes were being decoded to
java.util.GregorianCalendar and another bug where pymongo-was not working
in non-local Spark setups.

For complete details on the issues resolved in 1.5.2, consult the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/16602

1.5.1 / 9th of March, 2016
==========================

This release features a few fixes from 1.5.0, including patching a few MongoDB
connection leaks, avoiding a warning when using MongoUpdateStorage with Pig, and
allowing a limit to be set on MongoInputSplits.

For complete details on the issues resolved in 1.5.1, consult the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/16544

1.5.0 / 23rd of February, 2016
==============================

This release features major improvements to Pig, Hive, and Spark. Pig and Hive
both have the ability to push down simple queries and projections to MongoDB,
potentially saving time and memory when running MapReduce jobs. New included
UDFs allow writing MongoDB-specific types from Pig jobs and extracting timestamp
information from ObjectIds. A new "pymongo-spark" library (under
spark/src/main/python) allows using PyMongo objects with the connector, greatly
simplifying the Python interface to Spark when running with MongoDB.

For a complete list of tickets resolved in this release, see the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/15466

Changes from rc0:

   * [HADOOP-255] Return null early in getTypeForBSON if input is null.

1.5.0-rc0 / 1st of Febuary, 2016
================================

This release features major improvements to Pig, Hive, and Spark. Pig and Hive
both have the ability to push down simple queries and projections to MongoDB,
potentially saving time and memory when running MapReduce jobs. New included
UDFs allow writing MongoDB-specific types from Pig jobs and extracting timestamp
information from ObjectIds. A new "pymongo-spark" library (under
spark/src/main/python) allows using PyMongo objects with the connector, greatly
simplifying the Python interface to Spark when running with MongoDB.

For a complete list of tickets resolved in this release, see the release notes
on Jira: https://jira.mongodb.org/browse/HADOOP/fixforversion/15466

1.4.1 / 29th of September, 2015
===============================

This is a minor release that contains minor improvements and bug fixes from 1.4.0.

  * [HADOOP-231] (Python) Streaming reports success but output collection stays empty
  * [HADOOP-226] HiveException: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Timestamp
  * [HADOOP-219] Do not log the username:password portion of the mongo connection URI to hadoop logs

1.4 / 2nd of July, 2015
=======================

  * [HADOOP-206] Update progress inside MongoOutputCommitter so that Hadoop doesn't time out the commit

This stable release also includes all features and fixes from the 1.4-rc0 release described below.

1.4-rc0 / 18th of June, 2015
============================

  * [HADOOP-204] Allow concurrent access to MongoRecordReader instances
  * [HADOOP-201] Support mongo.auth.uri in StandaloneMongoSplitter
  * [HADOOP-196] Update Hadoop dependencies
  * [HADOOP-195] 3.0 Java driver compatibility
  * [HADOOP-188] Support MapWritable
  * [HADOOP-179] When mongo.output.uri has a replica set specified, mongo-hadoop fails
  * [HADOOP-175] Records dropped due to incorrectly computed file splits
  * [HADOOP-173] Bulk write support from MongoOutputFormat
  * [HADOOP-170] Pig integration doesn't call close() on Client
  * [HADOOP-153] Add capability of BSONLoader.java to parse UUID
  * [HADOOP-152] NumberFormatExceptions when splitting on a sharded, replica set cluster
  * [HADOOP-151] Fix MongoUpdateWritable serialization
  * [HADOOP-150] Use Primary read preference when sending splitVector command in StandaloneMongoSplitter
  * [HADOOP-143] MongoConfigUtil.getCollection() creates orphaned MongoClients
  * [HADOOP-110] Add non-args constructor for all spiltters for multi-collection input
  * [HADOOP-98] handle binary types in pig schema mode
  * [HADOOP-94] BSONLoader failing to parse binary data
  * [HADOOP-93] Processing GUID data when importing data from mongo to HDFS using Pig
  * [HADOOP-82] Use OutputCommitter with MongoOutputFormat

1.0.0 / 2012-04-09 
==================

  * Fixed file distribution for streaming addon files
  * Fixed Thrift dep for cdh3.
  * Add treasury yield example build support.
  * Added a Streaming Example M/R job with enron email corpus
  * HADOOP-29 - removes excessive logging for each tuple stored in MongoDB (RJurney)
  * Streaming: Add support for python generators in reduce functions (MLew)
  * Pig: Fix for exporting tuples to mongodb as map
  * Fixed CDH4 build flags to correct compilation step.
  * Fixed Hadoop build for dependencies across versions.
  * Added a "load-sample-data" task to use for loading samples into mongo for testing/demos
  * Hadoop 0.22.x support now works for those who need it (although I believe it's a deprecated branch)
  * Stock Apache 0.23.x now builds, using the actual 0.23.1 release...  insanity around the MapReduce dep
  * added twitter hashtag examples
  * Relocate Pymongo_Hadoop module to a new "language_Support" subdirectory. Created a setup.py file to build an egg / package. Available on PyPi as 'pymongo_hadoop'.
  * Fixed pymongo_hadoop output to use BSON.encode
  * Added support to streaming for the -file flag to distribute files out to the cluster if they don't exist.
  * Make InputFormat and OutputFormat implied on Streaming jobs, defaulting to the Mongo ones.
  * Streaming now builds as a fat assembly jar and works.
  * Added an 0.23 / cdh4 build.  No longer allow raw "cdh" or "cloudera" build artifacts to avoid confusion as to 'which cloudera?'
  * Added a .23 build, based on Cloudera's current distro (should be binary compatible with stock)
  * If combiner is not specified, do not pass it to Hadoop.  While the combiner should be optional, giving Hadoop a null combiner will result in a NullPointerException.

1.0.0-rc0 / 2012-02-12 
==================

  * Initial Release, Release Candidate
back to top