I/O

MakeProject

Object Merging

We introduced a new explicit interface for providing merging capability. If a class has a method with the name and signature:
   Long64_t Merge(TCollection *input, TFileMergeInfo*);
it will be used by a TFileMerger (and thus by PROOF) to merge one or more other objects into the current object. Merge should return a negative value if the merging failed.

If this method does not exist, the TFileMerger will use a method with the name and signature:

   Long64_t Merge(TCollection *input);
TClass now provides a quick access to these merging function via TClass::GetMerge. The wrapper function is automatically created by rootcint and can be installed via TClass::SetMerge. The wrapper function should have the signature/type ROOT::MergeFunc_t:
   Long64_t (*)(void *thisobj, TCollection *input, TFileMergeInfo*);
We added the new Merge function to TTree and THStack. We also added the new Merge function to TQCommand as the existing TQCommand::Merge does not have the right semantic (in part because TQCommand is a collection).

In TFileMerger, we added a PrintLevel to allow hadd to request more output than regular TFileMerger.

We removed all hard dependencies of TFileMerger on TH1 and TTree. (Soft dependencies still exist to be able to disable the merging of TTrees and to be able to disable the AutoAdd behavior of TH1).

The object TFileMergeInfo can be used inside the Merge function to pass information between runs of the Merge (see below). In particular it contains:

   TDirectory  *fOutputDirectory;  // Target directory where the merged object will be written.
   Bool_t       fIsFirst;          // True if this is the first call to Merge for this series of object.
   TString      fOptions;          // Additional text based option being passed down to customize the merge.
   TObject     *fUserData;         // Place holder to pass extra information.  This object will be deleted at the end of each series of objects.
The default in TFileMerger is to call Merge for every object in the series (i.e the collection has exactly one element) in order to save memory (by not having all the object in memory at the same time).

However for histograms, the default is to first load all the objects and then merge them in one go ; this is customizable when creating the TFileMerger object.

LZMA Compression and compression Level setting

ROOT I/O now support the LZMA compression algorithm to compress data in addition to the ZLIB compression algorithm. LZMA compression typically results in smaller files, but takes more CPU time to compress data. To use the new feature, the external XZ package must be installed when ROOT is configured and built: Download 5.0.3 from here tukaani.org and make sure to configure with fPIC:
   ./configure CFLAGS='-fPIC'
Then the client C++ code must call routines to explicitly request LZMA compression.

ZLIB compression is still the default.

Setting the Compression Level and Algorithm
There are three equivalent ways to set the compression level and algorithm. For example, to set the compression to the LZMA algorithm and compression level 5.
  1. TFile f(filename, option, title);
    f.SetCompressionSettings(ROOT::CompressionSettings(ROOT::kLZMA, 5));
    
  2. TFile f(filename, option, title, ROOT::CompressionSettings(ROOT::kLZMA, 5));
    
  3. TFile f(filename, option, title);
    f.SetCompressionAlgorithm(ROOT::kLZMA);
    f.SetCompressionLevel(5);
    
These methods work for TFile, TBranch, TMessage, TSocket, and TBufferXML. The compression algorithm and level settings only affect compression of data after they have been set. TFile passes its settings to a TTree's branches only at the time the branches are created. This can be overidden by explicitly setting the level and algorithm for the branch. These classes also have the following methods to access the algorithm and level for compression.
Int_t GetCompressionAlgorithm() const;
Int_t GetCompressionLevel() const;
Int_t GetCompressionSettings() const;
If the compression level is set to 0, then no compression will be done. All of the currently supported algorithms allow the level to be set to any value from 1 to 9. The higher the level, the larger the compression factors will be (smaller compressed data size). The tradeoff is that for higher levels more CPU time is used for compression and possibly more memory. The ZLIB algorithm takes less CPU time during compression than the LZMA algorithm, but the LZMA algorithm usually delivers higher compression factors.

The header file core/zip/inc/Compression.h declares the function "CompressionSettings" and the enumeration for the algorithms. Currently the following selections can be made for the algorithm: kZLIB (1), kLZMA (2), kOldCompressionAlgo (3), and kUseGlobalSetting (0). The last option refers to an older interface used to control the algorithm that is maintained for backward compatibility. The following function is defined in core/zip/inc/Bits.h and it set the global variable.

R__SetZipMode(int algorithm);
If the algorithm is set to kUseGlobalSetting (0), the global variable controls the algorithm for compression operations. This is the default and the default value for the global variable is kZLIB.

Asynchronous Prefetching

The prefetching mechanism uses two new classes (TFilePrefetch and TFPBlock) to prefetch in advance a block of tree entries. There is a thread which takes care of actually transferring the blocks and making them available to the main requesting thread. Therefore, the time spent by the main thread waiting for the data before processing considerably decreases. Besides the prefetching mechanisms there is also a local caching option which can be enabled by the user. Both capabilities are disabled by default and must be explicitly enabled by the user.

In order to enable the prefetching the user must set the rootrc environment variable TFile.AsyncPrefetching as follows: gEnv->SetValue("TFile.AsyncPrefetching", 1). Only when the prefetching is enabled can the user set the local cache directory in which the file transferred will be saved. For subsequent reads of the same file the system will use the local copy of the file from cache. To set up a local cache directory, the client can use the following commands:

TString cachedir="file:/tmp/xcache/";
// or using xrootd on port 2000  
// TString cachedir="root://localhost:2000//tmp/xrdcache1/";
gEnv->SetValue("Cache.Directory", cachedir.Data());
The TFilePrefetch class is responsible for actually reading and storing the requests received from the main thread. It also creates the working thread which will transfer all the information. Apart from managing the block requests, it also deals with caching the blocks on the local machine and retrieving them when necessary.

The TFPBlock class represents the encapsulation of a block request. It contains the chunks to be prefetched and also serves as a container for the information read.