https://github.com/voldemort/voldemort
Revision 66bfe3b85f810510210210d4095535c18dd14517 authored by Arunachalam Thirupathi on 22 March 2016, 18:19:37 UTC, committed by Arunachalam Thirupathi on 24 March 2016, 18:29:35 UTC
Issue : After the HadoopFileSystem object is created, the validity of
the fileSystem is verified by doing a sample operation. If the operation
fails the Hadoop FileSystem object is leaked. This object should be
cleaned up by the Garbage collection, but all the FileSystem objects
are cached, so this is leaked. When voldemort server is used with
secure webhdfs (swebhdfs) file system it leaks enough memory to kill
the servers eventually.

Previously in voldemort webhdfs file system handles were leaked.
Apparently webhdfs file system handles are very cheap. But in SwebHdfs
they have the security certificate embedded in them. This causes them to
be very big.

Heap Dump analysis
WebHdfsFileSystem - 3768 Objects - 80 MB
SWebHdfsFileSystem - 1748 Objects - 3 GB

Solution :
1)  Disable the caching for the following reason
 Hadoop FileSystem class caches the FileSystem objects based on
 the scheme , authority and UserGroupInformation.

 The default config was to generate new UserGroupInformation for
 each call, so the cache will be never hit. In the case where the
 FileSystem is not closed correctly, it will leak handles.

 But if the UserGroupInformation is re-used, it will cause the
 FileSystem object to be shared between HdfsFetcher /
 HdfsFailedFetchLock. Each Voldemort HdfsFetcher/HAFailedFetchLock
 lock closes the fileSystem object at the end, though others might
 still be using it. This causes random failures.

 Since it does not work in both the cases, the Caching is
 disabled. The caching should be only enabled if the
 UserGroupInformation is to be re-used and the close bug is fixed.

2) Clean up the file handles on the error cases. Traced down all the
handles and cleaned them up on the error path.
1 parent af3d605
History
Tip revision: 66bfe3b85f810510210210d4095535c18dd14517 authored by Arunachalam Thirupathi on 22 March 2016, 18:19:37 UTC
Hdfs FileSystem handles are leaked in Fetch
Tip revision: 66bfe3b
File Mode Size
.settings
bin
clients
config
contrib
docs
example
gradle
private-lib
src
test
voldemort-contrib
voldemort-protobuf
.gitignore -rw-r--r-- 242 bytes
CONTRIBUTORS -rw-r--r-- 659 bytes
LICENSE -rw-r--r-- 11.1 KB
NOTES -rw-r--r-- 2.5 KB
NOTICE -rw-r--r-- 8.1 KB
README.md -rw-r--r-- 4.7 KB
build.gradle -rw-r--r-- 21.8 KB
build.xml -rw-r--r-- 1.7 KB
gradle.properties -rw-r--r-- 1.1 KB
gradlew -rwxr-xr-x 5.0 KB
gradlew.bat -rw-r--r-- 2.3 KB
release_notes.txt -rw-r--r-- 49.0 KB
settings.gradle -rw-r--r-- 149 bytes
tomcat-tasks.properties -rw-r--r-- 420 bytes
web.xml -rw-r--r-- 1.1 KB

README.md

back to top