https://github.com/Unipisa/CMM
Raw File
Tip revision: cfa29670480cf0d5f09f5e3055e530dec3e6ff65 authored by Giuseppe Attardi on 24 November 1996, 18:13:45 UTC
1.7 -
Tip revision: cfa2967
CMMguide.tex
\documentstyle[11pt,fullpage]{article}
\def\theenumi{\alph{enumi}}
\def\labelenumi{(\alph{enumi})}
\markright{Formal Spec for POSSO Memory Management}
\pagestyle{myheadings}
\begin{document}
\begin{verbatim}
POSSO FILE HEADER
type:     FORMAL SPECIFICATION (C++ library)
title:    POSSO Customisable Memory Management (CMM)
date:     1 November 1995
author:   Tito Flagella and Giuseppe Attardi
email:    tito@di.unipi.it
address:  Dipartimento di Informatica,
          Universita' di Pisa,
          Corso Italia 40, I-56125 Pisa
          Italy
\end{verbatim}

\section{Summary of changes from previous version}
\begin{itemize}
\item significant rewrite to provide multiple heaps as described in the
USENIX C++ paper;
\item eliminated GC header for objects;
\item the space for tables is recycled;
\item the collector consists of just two files cmm.h, cmm.cc.
\end{itemize}

\subsection {Known Shortcomings}
\begin{itemize}
\item Heap programming still undocumented
\item Multiple inheritance not yet supported
\item Finalization Protocol missing
\end{itemize}

\section{Introduction}

This is the POSSO specification for Customisable Memory Management (CMM).  CMM
is a {\tt C++} Memory Management facility, aiming to allow full integration
between coexisting different memory management techniques.  Refer to [Attardi
93] for a detailed paper on CMM.

This document is intended as a brief reference guide to the use of CMM.  CMM
can be used at two levels: the user level, and the programmer level.  The user
will usually look at CMM as a library of garbage collecting algorithms.  CMM
provides the user with facilities to define a {\tt C++} class as ``garbage
collected'' and to define the particular garbage collector for that class.

\section{The User Interface}

Memory in the CMM consists of several heaps of various kinds, called {\em
  Heaps}.  Each kind of Heap is implemented as a {\tt C++} class and provides
a specific allocation and reclamation policy.  For instance, one Heap may
perform memory management through a two space stop-and-copy collector, another
Heap may work with a LIFO discipline, and a third may implement a generational
garbage collector.  Each Heap is a collection of not necessarily contiguous
memory chunks.

Classes whose objects are to be dynamically allocated and subject to garbage
collection are called {\em dynamic\/} classes.  Their instances are called {\em
dynamic\/} objects.  A dynamic object is allocated inside a single Heap.
The CMM user interface provides constructs to specify in which Heap to
allocate objects.


A dynamic class must derived from the class {\tt CmmObject} provided by the CMM.
The default collector calls the method {\tt traverse} on dynamic objects to
identify their internal pointers to other dynamic objects.  Users have to
provide {\tt traverse} methods for each class whose data members contain
pointers to dynamic objects.  {\tt traverse} must be defined according to some
well defined rules (given below), because it implements the interface between
the CMM and user defined dynamic objects.  The following is the function
prototype for {\tt traverse}:
\begin{verbatim}
void traverse()
\end{verbatim}
The {\tt Heap} argument is present because the same objects can be traversed
from garbage collectors of different heaps.

Final users of CMM need not care about this parameter since they can just
follow a few simple rules to define the {\tt traverse} member function.  These
rules ensure that superclasses or class objects contained in the class are
correctly handled.  The following example illustrates these rules, then we
explain what the rules are.  Suppose the following dynamic classes were
defined:
\begin{verbatim}
class BigNum : public CmmObject
{
  long data;
  BigNum *next;                         // Rule (a) applies
  void traverse();
}

class monomial : BigNum                 // Rule (c) applies
{
  PowerProduct pp;                      // Rule (b) applies
  void traverse();
}
\end{verbatim}
Since a {\tt BigNum} stores a pointer to a collectable object in {\tt next}
this has to be followed by {\tt traverse}, thus we obtain the traversal
function:

\begin{verbatim}
void BigNum::traverse()
{
  CmmHeap::heap->scavenge(next);             // Applying rule (a)
}
\end{verbatim}
Because {\tt monomial} inherits from {\tt BigNum}, the method {\tt traverse}
for this base class must be invoked; finally, since a {\tt monomial} contains
a {\tt BigNum} in {\tt pp}, this object must be traversed as well:

\begin{verbatim}
void monomial::traverse()
{
  BigNum::traverse();               // Appling rule (c)
  pp.traverse();                    // Applying rule (b)
}
\end{verbatim}

In summary the rules are:

\begin{enumerate}
\item for a class containing a pointer to a dynamic object, say \verb|class C
  { DynObj* x; }|, the method \verb|C::traverse| must contain
  \verb|Cmm::heap->scavenge(x)|
\item for a class containing an instance of a dynamic object, say \verb|class
  C { DynObj x; }|, the method \verb|C::traverse| must contain
  \verb|x.traverse()|
\item for a class derived from another dynamic class, say \verb|class
  C:DynBase {...}|, the method \verb|C::traverse| must contain
  \verb|DynBase::traverse()|.
\end{enumerate}

Preprocessing [Edelson 92] or compiler support [Samples 92] could be
adopted to reduce hand coding from the user and risks of subtle errors
in programs. We do not address these issues here since within POSSO
the burden of following the above rules has been considered affordable.


\subsection{Dynamic Objects}

When creating a dynamic object one must specify in which Heap to allocate it.
The parameter {\tt heap} can be supplied in the standard {\tt C++} placement
syntax for the {\tt new} operator:
\begin{verbatim}
p = new(heap) Person(name, age);
\end{verbatim}
If the user does not specify any heap, the default heap {\tt heap} is used:
\begin{verbatim}
p = new Person(name, age);
\end{verbatim}
which is equivalent to:
\begin{verbatim}
p = new(heap) Person(name, age);
\end{verbatim}
where {\tt heap} is a global variable initialised to the default heap, but can
be set by the user to any other heap.

When creating a dynamic object, the programmer can dynamically decide where to
allocate the memory.  Consider the following example:

\begin{verbatim}
class Person : public CmmObject
{
...
};

Person *obj;
Heap *other;
....
if ( /* I prefer the default heap */ )
   obj = new Person(arg);
else obj = new(other) Person(arg);
\end{verbatim}


\subsection{Dynamic Variable Sized Objects}

To support variable sized objects, the class {\tt CmmVarObject} is available.
This is a derived class from {\tt CmmObject}, providing a different operator
{\tt new} with an extra parameter for the size of the object.  Classes derived
from {\tt CmmVarObject} are called {\it dynamic variable sized classes\/}.
The following example illustrates their use:

\begin{verbatim}
class container : public CmmVarObject
{
  int header;
  int table[1];  // size of this array is determined upon object creation
};

container *obj;

ExtraSize = 256 * sizeof(int);

obj = new(ExtraSize) container; // this object is created in the default heap

Heap *other;
obj = new(ExtraSize, other) container; // this object is created in heap other
\end{verbatim}
The object created, {\tt obj}, now has room for a table of 256 integers.

\subsubsection*{Warning}

Note that certain restrictions apply to variable size dynamic classes.  In
particular, the variable size field must be the last data field in the class
(and all data members must have the same accessibility), no local variable can
be declared of variable size class (but of course pointers to such class can
be used), and no classes should be derived from a variable sized class.  In
fact, the compiler does not know about the extra memory allocated so that
extra memory may be lost or overwritten if any of the above rules is broken.

Despite these restrictions there are some cases in which the use of such
classes cannot be avoided. The usual way to achieve this effect in {\tt C++}
is to allocate dynamically the extra size in the object's constructor, but in
this way the extra memory allocated would be neither a {\tt C++} object nor a
{\tt CmmObject}.

\subsection{Array of CmmObject's}

The following class can be used to create arrays of CmmObject's like follows:

\begin{verbatim}
       CmmArray<MyClass> myVector = * new (100) CmmArray<MyClass> ;
\end{verbatim}
       
The operator [] can be used to get CmmObjects.

\begin{verbatim}
       myVector[i]->print();
\end{verbatim}
or:
\begin{verbatim}
       MyClass mc = myVector[3];
\end{verbatim}

\subsection{A sample program}

\begin{verbatim}
#include "tempheap.h"

class cell : CmmObject          // This class is a GC class
{
  int x;

public: 
  cell *next;                  // This field is a pointer to a GC class,
                               // and must be traversed.

  void traverse();             // Because "cell" has internal pointers
                               // traverse must be defined.
};

void cell::traverse()
 { 
   Cmm::heap->scavenge((CmmObject **)&next); 
                               // traverse scavenge the internal pointer
                               // next to reach other cells.
 }

main()
{
  Heap *myHeap = new TempHeap(100);
                               // Create the new heap myHeap.
                               // Here you can use any of the predefined
                               // heaps.

  cell *t = new cell;
                               // Create a new cell. 
                               // Because you have not specified an heap
                               // with new, the global variable heap is used.
                               // heap is initialized to the Default heap.

  t->next = new (myHeap) cell;
                               // Create another cell, but in myHeap

  heap = myHeap;
  t->next->next = new cell;
                               // Setting heap to myHeap, you can allocate cells
                               // from myHeap, without specifing it as a parameter
                               // of new.

  heap->collect();
                               // Collecting on myHeap
  Cmm::theDefaultHeap->collect();
                               // Collecting on the default heap.
  heap = Cmm::theDefaultHeap;  // Reset heap before returning.
}
\end{verbatim}

\section{Heaps}

The CMM programmer interface is for use of implementors of additional Heaps. A
new heap must be derived from the abstract class {\tt CmmHeap} and must supply
definitions for its pure virtual functions: so the creation of a Heap involves
writing a garbage collector ({\tt collect}), the memory allocation strategy
(allocator and reclaimer) and the strategy used by the collector to act on the
internal pointers encountered during traversal ({\tt scavenge}).

This feature of CMM is still under testing. Currently the user must use one of
the predefined Heaps. The heaps currently available are:

\begin{itemize}
\item{{\tt DefaultHeap}} This heap embeds the Bartlett's mostly copying
  garbage collector [Bartlett 88].  It is predefined in CMM as the default
  heap.  Users are not allowed to create new instances of {\tt DefaultHeap}.
  This heap is accessible through variable {\tt Cmm::theDefaultHeap}.

\item{{\tt TempHeap}}
\begin{itemize}

\item A {\tt TempHeap} is created using:
\begin{verbatim}
                new TempHeap(unsigned words);
\end{verbatim}
  It is a variable size stack-like heap, implemented as a number of
  containers, some for the ``from space'', and others for the ``to space''.

\item Allocation is done in a stack-like way.

\item Deallocation is done by the garbage collector, compacting the stack.

\item The root set is defined by the user.

\item The garbage collector is a copying collector.
\end{itemize}

\item{{\tt TempHeap}}
\begin{itemize}

\item A TempHeap is created using:
\begin{verbatim}
                new TempHeap(unsigned words);
\end{verbatim}
  It is a variable size stack-like heap, implemented as a list of containers.
  {\tt words} specifies the size of each container.

\item Allocation is implemented in a stack-like way. Any object must be
  smaller than the size of each container ({\tt words}).

\item Deallocation is done by the garbage collector, compacting the stack.

\item The root set is defined by the user.

\item The garbage collector is a copying collector.
\end{itemize}

\end{itemize}

\subsection{The root set}

Many heaps (like {\tt TempHeap}) require the user to explicitly
register the possible roots.
To support that, the class {\tt CmmHeap} contains an instance of the class
{\tt RootSet} supporting the following operations:

\begin{verbatim}

void set(CmmObject *);
void unset(CmmObject *);

void setp(CmmObject **);
void unsetp(CmmObject **);
\end{verbatim}

{\tt setp} and {\tt unsetp} are used to (un)register pointers to GC objects as
roots.  {\tt set} and {\tt unsetp} are used to (un)register GC objects as
roots.  Consider the following example:

\begin{verbatim}
cell GlobalRoot;                            
                                        // Define a cell variable

main()
{
  cell *LocalRoot = new cell;               
                                        // Define a cell pointer
  tempheap *myHeap = new TempHeap(10000); 
                                        // Create a new heap

  myHeap->roots.setp((CmmObject **)&LocalRoot);  
                                        // Register the pointer as a root

  myHeap->roots.set(&GlobalRoot);           
                                        // Register the cell as a root

  LocalRoot->next = new(myHeap) cell;   
                                        // Allocates some new cells
  GlobalRoot.next = new(myHeap) cell;

  myHeap->collect();                        
                                        // The collector will identify
                                        // any allocated cell, starting
                                        // traversing from cell LocalRoot
                                        // and GlobalRoot

  myHeap->roots.unsetp((CmmObject **)&LocalRoot);
                                        // Deregister the local root.
}
\end{verbatim}

\section{References}
\begin{description}
\item [{[Bartlett 88]}] Joel F. Bartlett ``Compacting garbage collection with
  ambiguous roots'' Tech. Rep.  88/2, DEC Western Research Laboratory, Palo
  Alto, California, February 1988.
\item [{[Bartlett 89]}] Joel F. Bartlett ``Mostly-copying collection picks up
  generations and {\tt C++}'', Tech.  Rep.  TN-12, DEC Western Research
  Laboratory, Palo Alto, California, October 1989.
\item [{[Boehm 88]}] H.-J. Boehm and M. Wiser ``Garbage collection in an
  uncooperative environment'', Software Practice and Experience, 18(9), 1988,
  807-820.
\item [{[Edelson 92]}] D.R. Edelson ``Precompiling {\tt C++} for garbage
  collection'', in {\em Memory Management}, Y. Bekkers and J. Cohen (Eds.),
  Lecture Notes in Computer Science, n. 637, Springer-Verlag, 1992, 299-314.
\item [{[Edelson 92b]}] D.R. Edelson ``A mark-and-sweep collector for {\tt
    C++}'', Proc. of ACM Conference on Principle of Programming Languages,
  1992.
\item [{[Samples 92]}] A.D. Samples ``GC-cooperative {\tt C++}'', Lecture
  Notes in Computer Science, n. 637, Springer-Verlag, 1992, 315-329.
\item [{[Wentworth 90]}] E. P. Wentworth ``Pitfalls of conservative garbage
  collection'', Software Practice and Experience, 20(7), 1990, 719-727.
\item [{[Wilson 92]}] P.R. Wilson ``Uniprocessor garbage collection
  techniques'', in {\em Memory Management}, Y. Bekkers and J. Cohen (Eds.),
  Lecture Notes in Computer Science, n.  637, Springer-Verlag, 1992, 1-42.
\end{description}

\end{document}
back to top