\documentstyle[11pt,fullpage]{article} \def\theenumi{\alph{enumi}} \def\labelenumi{(\alph{enumi})} \markright{Formal Spec for POSSO Memory Management} \pagestyle{myheadings} \begin{document} \begin{verbatim} POSSO FILE HEADER type: FORMAL SPECIFICATION (C++ library) title: POSSO Customisable Memory Management (CMM) date: 1 November 1995 author: Tito Flagella and Giuseppe Attardi email: tito@di.unipi.it address: Dipartimento di Informatica, Universita' di Pisa, Corso Italia 40, I-56125 Pisa Italy \end{verbatim} \section{Summary of changes from previous version} \begin{itemize} \item significant rewrite to provide multiple heaps as described in the USENIX C++ paper; \item eliminated GC header for objects; \item the space for tables is recycled; \item the collector consists of just two files cmm.h, cmm.cc. \end{itemize} \subsection {Known Shortcomings} \begin{itemize} \item Heap programming still undocumented \item Multiple inheritance not yet supported \item Finalization Protocol missing \end{itemize} \section {Introduction} This is the POSSO specification for Customisable Memory Management (CMM). CMM is a {\tt C++} Memory Management facility, aiming to allow full integration between coexisting different memory management techniques. Refer to [Attardi 93] for a detailed paper on CMM. This document is intended as a brief reference guide to the use of CMM. CMM can be used at two levels: the user level, and the programmer level. The user will usually look at CMM as a library of garbage collecting algorithms. CMM provides the user with facilities to define a {\tt C++} class as ``garbage collected'' and to define the particular garbage collector for that class. \section {The User Interface} Memory in the CMM consists of several heaps of various kinds, called {\em Heaps}. Each kind of Heap is implemented as a {\tt C++} class and provides a specific allocation and reclamation policy. For instance, one Heap may perform memory management through a two space stop-and-copy collector, another Heap may work with a LIFO discipline, and a third may implement a generational garbage collector. Each Heap is a collection of not necessarily contiguous memory chunks. Classes whose objects are to be dynamically allocated and subject to garbage collection are called {\em dynamic\/} classes. Their instances are called {\em dynamic\/} objects. A dynamic object is allocated inside a single Heap. The CMM user interface provides constructs to specify in which Heap to allocate objects. A dynamic class must derived from the class {\tt CmmObject} provided by the CMM. The default collector calls the method {\tt traverse} on dynamic objects to identify their internal pointers to other dynamic objects. Users have to provide {\tt traverse} methods for each class whose data members contain pointers to dynamic objects. {\tt traverse} must be defined according to some well defined rules (given below), because it implements the interface between the CMM and user defined dynamic objects. The following is the function prototype for {\tt traverse}: \begin{verbatim} void traverse() \end{verbatim} The {\tt Heap} argument is present because the same objects can be traversed from garbage collectors of different heaps. Final users of CMM need not care about this parameter since they can just follow a few simple rules to define the {\tt traverse} member function. These rules ensure that superclasses or class objects contained in the class are correctly handled. The following example illustrates these rules, then we explain what the rules are. Suppose the following dynamic classes were defined: \begin{verbatim} class BigNum : public CmmObject { long data; BigNum *next; // Rule (a) applies void traverse(); } class monomial : BigNum // Rule (c) applies { PowerProduct pp; // Rule (b) applies void traverse(); } \end{verbatim} Since a {\tt BigNum} stores a pointer to GCable object in {\tt next} this has to be followed by {\tt traverse}, thus we obtain the traversal function: \begin{verbatim} void BigNum::traverse() { CmmHeap::heap->scavenge(next); // Applying rule (a) } \end{verbatim} Because {\tt monomial} inherits from {\tt BigNum}, the method {\tt traverse} for this base class must be invoked; finally, since a {\tt monomial} contains a {\tt BigNum} in {\tt pp}, this object must be traversed as well: \begin{verbatim} void monomial::traverse() { BigNum::traverse(); // Appling rule (c) pp.traverse(); // Applying rule (b) } \end{verbatim} In summary the rules are: \begin{enumerate} \item for a class containing a pointer to a dynamic object, say \verb|class C { DynObj* x; }|, the method \verb|C::traverse| must contain \verb|Cmm::heap->scavenge(x)| \item for a class containing an instance of a dynamic object, say \verb|class C { DynObj x; }|, the method \verb|C::traverse| must contain \verb|x.traverse()| \item for a class derived from another dynamic class, say \verb|class C:DynBase {...}|, the method \verb|C::traverse| must contain \verb|DynBase::traverse()|. \end{enumerate} Preprocessing [Edelson 92] or compiler support [Samples 92] could be adopted to reduce hand coding from the user and risks of subtle errors in programs. We do not address these issues here since within POSSO the burden of following the above rules has been considered affordable. \subsection{Dynamic Objects} When creating a dynamic object one must specify in which Heap to allocate it. The parameter {\tt heap} can be supplied in the standard {\tt C++} placement syntax for the {\tt new} operator: \begin{verbatim} p = new(heap) Person(name, age); \end{verbatim} If the user does not specify any heap, the default heap {\tt heap} is used: \begin{verbatim} p = new Person(name, age); \end{verbatim} which is equivalent to: \begin{verbatim} p = new(heap) Person(name, age); \end{verbatim} where {\tt heap} is a global variable initialised to the default heap, but can be set by the user to any other heap. When creating a dynamic object, the programmer can dynamically decide where to allocate the memory. Consider the following example: \begin{verbatim} class Person : public CmmObject { ... }; Person *obj; Heap *other; .... if ( /* I prefer the default heap */ ) obj = new Person(arg); else obj = new(other) Person(arg); \end{verbatim} \subsection{Dynamic Variable Sized Objects} To support variable sized objects, the class {\tt CmmVarObject} is available. This is a derived class from {\tt CmmObject}, providing a different operator {\tt new} with an extra parameter for the size of the object. Classes derived from {\tt CmmVarObject} are called {\it dynamic variable sized classes\/}. The following example illustrates their use: \begin{verbatim} class container : public CmmVarObject { int header; int table[1]; // size of this array is determined upon object creation }; container *obj; ExtraSize = 256 * sizeof(int); obj = new(ExtraSize) container; // this object is created in the default heap Heap *other; obj = new(ExtraSize, other) container; // this object is created in heap other \end{verbatim} The object created, {\tt obj}, now has room for a table of 256 integers. \subsubsection*{Warning} Note that variable sized dynamic classes must be used more carefully than normal classes. In particular, the variable sized field must be the last data field in the class (and all data members must have the same accessibility), local variables must never be of variable size, and other classes must not be derived from a variable sized class. In fact, the compiler does not know about the extra memory allocated so that extra memory may be lost or overwritten if any of the above rules is broken. Despite these restrictions there are some cases in which the use of such classes cannot be avoided. The usual way to achieve this effect in {\tt C++} is to allocate dynamically the extra size in the object's constructor, but in this way the extra memory allocated would be neither a {\tt C++} object nor a {\tt CmmObject}. \subsection {Array of CmmObjects} The following class can be used to create arrays of CmmObjects like follows: \begin{verbatim} GcArray myVector = * new (100) GcArray ; \end{verbatim} The operator [] can be used to get CmmObjects. \begin{verbatim} myVector[i]->print(); \end{verbatim} or: \begin{verbatim} MyClass mc = myVector[3]; \end{verbatim} \subsection{A sample program} \begin{verbatim} #include "tempheap.h" class cell : CmmObject // This class is a GC class { int x; public: cell *next; // This field is a pointer to a GC class, // and must be traversed. void traverse(); // Because "cell" has internal pointers // traverse must be defined. }; void cell::traverse() { Cmm::heap->scavenge((CmmObject **)&next); // traverse scavenge the internal pointer // next to reach other cells. } main() { Heap *myHeap = new TempHeap(100); // Create the new heap myHeap. // Here you can use any of the predefined // heaps. cell *t = new cell; // Create a new cell. // Because you have not specified an heap // with new, the global variable heap is used. // heap is initialized to the Default heap. t->next = new (myHeap) cell; // Create another cell, but in myHeap heap = myHeap; t->next->next = new cell; // Setting heap to myHeap, you can allocate cells // from myHeap, without specifing it as a parameter // of new. heap->collect(); // Collecting on myHeap Cmm::theDefaultHeap->collect(); // Collecting on the default heap. heap = Cmm::theDefaultHeap; // Reset heap before returning. } \end{verbatim} \section {heaps} The CMM programmer interface is for use of implementors of additional Heaps. A new heap must be derived from the abstract class {\tt CmmHeap} and must supply definitions for its pure virtual functions: so the creation of a Heap involves writing a garbage collector ({\tt collect}), the memory allocation strategy (allocator and reclaimer) and the strategy used by the collector to act on the internal pointers encountered during traversal ({\tt scavenge}). This feature of CMM is still under testing. Currently the user must use one of the predefined Heaps. The heaps currently available are: \begin{itemize} \item{{\tt DefaultHeap}} This heap embeds the Bartlett's mostly copying garbage collector [Bartlett 88]. It is predefined in CMM as the default heap. Users are not allowed to create new instances of {\tt DefaultHeap}. This heap is accessible through variable {\tt Cmm::theDefaultHeap}. \item{{\tt TempHeap}} \begin{itemize} \item A {\tt TempHeap} is created using: \begin{verbatim} new TempHeap(unsigned words); \end{verbatim} It is a variable size stack-like heap, implemented as a number of containers, some for the ``from space'', and others for the ``to space''. \item Allocation is done in a stack-like way. \item Deallocation is done by the garbage collector, compacting the stack. \item The root set is defined by the user. \item The garbage collector is a copying collector. \end{itemize} \item{{\tt TempHeap}} \begin{itemize} \item A TempHeap is created using: \begin{verbatim} new TempHeap(unsigned words); \end{verbatim} It is a variable size stack-like heap, implemented as a list of containers. {\tt words} specifies the size of each container. \item Allocation is implemented in a stack-like way. Any object must be smaller than the size of each container ({\tt words}). \item Deallocation is done by the garbage collector, compacting the stack. \item The root set is defined by the user. \item The garbage collector is a copying collector. \end{itemize} \end{itemize} \subsection{The root set} Many heaps (like {\tt TempHeap}) require the user to explicitly register the possible roots. To support that, the class {\tt CmmHeap} contains an instance of the class {\tt RootSet} supporting the following operations: \begin{verbatim} void set(CmmObject *); void unset(CmmObject *); void setp(CmmObject **); void unsetp(CmmObject **); \end{verbatim} {\tt setp} and {\tt unsetp} are used to (un)register pointers to GC objects as roots. {\tt set} and {\tt unsetp} are used to (un)register GC objects as roots. Consider the following example: \begin{verbatim} cell GlobalRoot; // Define a cell variable main() { cell *LocalRoot = new cell; // Define a cell pointer tempheap *myHeap = new TempHeap(10000); // Create a new heap myHeap->roots.setp((CmmObject **)&LocalRoot); // Register the pointer as a root myHeap->roots.set(&GlobalRoot); // Register the cell as a root LocalRoot->next = new(myHeap) cell; // Allocates some new cells GlobalRoot.next = new(myHeap) cell; myHeap->collect(); // The collector will identify // any allocated cell, starting // traversing from cell LocalRoot // and GlobalRoot myHeap->roots.unsetp((CmmObject **)&LocalRoot); // Deregister the local root. } \end{verbatim} \section{References} \begin{description} \item [{[Bartlett 88]}] Joel F. Bartlett ``Compacting garbage collection with ambiguous roots'' Tech. Rep. 88/2, DEC Western Research Laboratory, Palo Alto, California, February 1988. \item [{[Bartlett 89]}] Joel F. Bartlett ``Mostly-copying collection picks up generations and {\tt C++}'', Tech. Rep. TN-12, DEC Western Research Laboratory, Palo Alto, California, October 1989. \item [{[Boehm 88]}] H.-J. Boehm and M. Wiser ``Garbage collection in an uncooperative environment'', Software Practice and Experience, 18(9), 1988, 807-820. \item [{[Edelson 92]}] D.R. Edelson ``Precompiling {\tt C++} for garbage collection'', in {\em Memory Management}, Y. Bekkers and J. Cohen (Eds.), Lecture Notes in Computer Science, n. 637, Springer-Verlag, 1992, 299-314. \item [{[Edelson 92b]}] D.R. Edelson ``A mark-and-sweep collector for {\tt C++}'', Proc. of ACM Conference on Principle of Programming Languages, 1992. \item [{[Samples 92]}] A.D. Samples ``GC-cooperative {\tt C++}'', Lecture Notes in Computer Science, n. 637, Springer-Verlag, 1992, 315-329. \item [{[Wentworth 90]}] E. P. Wentworth ``Pitfalls of conservative garbage collection'', Software Practice and Experience, 20(7), 1990, 719-727. \item [{[Wilson 92]}] P.R. Wilson ``Uniprocessor garbage collection techniques'', in {\em Memory Management}, Y. Bekkers and J. Cohen (Eds.), Lecture Notes in Computer Science, n. 637, Springer-Verlag, 1992, 1-42. \end{description} \end{document}