\documentclass [a4paper] {article}
\title{Simple Notes on ``Color'' for ocamlp3l}
\author{Roberto DI COSMO \and LI Zheng}

\begin{document}
\maketitle

Recently, we added a small new feature ``color'' to ocamlp3l. It
provides a simple way of specifying both the relative computing
capability of a physical node and the capability needed by a
virtual node. So more customizable mapping between the physical
nodes and virtual nodes could be done by the programmer of in this
way.

\section{Concept}

A ``color'' is in fact a integer that presents relative computing
capability. A larger ``color'' presents higher capability of
computing. So a ``color'' can also be considered as something like
``rank'' or ``weight''.

It can used both with a physical node denoting the relative
capability it holds and with a virtual node denoting the relative
capability it needs. For the virtual node, if there is physical
node with larger color value available, it will also be OK and
obviously better. So the color for a virtual node denotes
naturally the lowest capability it need.

If not specified with color values, a physical node or a virtual
node is automatical assigned a color value 0. For a physical node,
the default value 0 means lowest rank of capability. While for a
virtual node, the default value 0 means the required physical node
should equal to or higher than lowest rank 0 which in fact means
``do not care'' or say ``map to any physical node is OK ''.

One thing should be emphasized here is that the color doesn't have
any proportionate relation to the real computing capability. It
only means ``high or low'' not ``how much''. So the color now is
only a simple qualitative approach, not a quantitative approach in
which more automation and optimization could be done.

\section{Syntax and Informal Semantics}
\subsection{Color for Physical Nodes}
We specify the colors with the physical nodes in the command of
ocamlp3l executable file like:

\begin{figure}[h]
xxx.par\ \ -p3lroot\ \ ip1:port1\#col1\ \ ip2:port2\#col2\ ...\
ipn:portn\#coln
\end{figure}


As you my see above, we add a small change to the format of each
physical node. The parameter of color (col) which has a prefix
separator \# is added to the end of each physical node, obviously
denoting the relative capability of this physical node. Both port
and color are optional arguments and have their respective default
values.

A actual example is shown below:

\begin{figure}[h]
xxx.par\ \ -p3lroot\ \ 134.157.168.68:2011\#6\ \
134.157.168.22:223\ \ 134.157.168.4\#3

134.157.168.7\ \ pps.jussieu.fr:76\#5\ \ aa.pps.jussieu.fr:678\ \
bb.pps.jussieu.fr\#2

cc.pps.jussieu.fr
\end{figure}


\subsection{Color for Virtual Nodes}
We specify the colors with the virtual nodes in the ocamlp3l code
in this way

\begin{figure}[h]
\centering
\begin{tabular}{|l|}
\hline
seq  $\sim$col:number (f,n)\\
farm $\sim$col:number $\sim$colv:[number1;number2;...number n] (f,n)\\
loop $\sim$col:number (c,f)\\
mapvector $\sim$col:number $\sim$colv:[number1;number2;...number n] (f,n)\\
reducevector $\sim$col:number
$\sim$colv:[number1;number2;...number n]
(f,n)\\
\hline
\end{tabular}
\end{figure}



For there is no virtual nodes or physical nodes produced by pipe
skeleton, there is no need to add color syntax for it. The symbol
$\sim$ is used in OCaml from version 3.04 to present optional
arguments. All color parameters are optional. This provide good
seamless compatibility to former source code.

So what should the semantics of col be like? Should they be
specified to only this current layer or the whole internal
structure?

As we know the skeleton functions farm, loop, mapvector etc.
present in fact a structure of parallelism. For an example, the
farm(f,n) is composed of a farmemitter, a farmcollector and n
composite structures of f. So if the color parameters have effect
only on the current layer which are only the farmemitter and
farmcollector indeed. That's obviously much complex in writting
and not the semantics we really want, although it is still doable.
In many many cases, we don't even care the capability of such
control node like farmemitter or farmcollector.

It is of the same problem if the colors are applied only to the
internal layers such as the f in farm(f,n). Under such
supposition, the colors of farm are in fact applied to f while the
colors of control nodes of farm itself are specified by upper
layer. It unnecessarily adds much complexity and confusion to
programming.

Considering the format in which the color is written, we can find
it is more nature to assume that a color is applied to the whole
structure of the skeleton function it belongs to. In such a way,
the color of farm(f,n) is applied not only the farmemitter,
farmcollector but also the n functions f. For the f could also be
some composite skeleton function, the effect of this color will be
recursively applied to the internal of f.

In common sense, we also assume that the local colors have more
precedence then the external ones. For instance, in
\begin{figure}[h]
farm $\sim$col:2 ( (seq f)$|||$(seq $\sim$col:6 g), 5 )
\end{figure}

The color of farm acts on the farmemitter, farmcollector and also
recursively descend to the internal functions. Not specified local
color, the function (seq f) will inherit the external color value
2 from farm. Specified local color, the (seq $\sim$col:6 g) takes
the value of local color 6 instead of inherited color 2.

Some skeleton functions are endowed with the color list colv. It
provide a way of assigning different colors to internal parallel
structures. For example,

\begin{figure}[h]
mapvector $\sim$col:2 $\sim$colv:[3;4;8;7] (seq f, 4)
\end{figure}

The parameter col acts as before. The mapemitter and mapcollector
are assigned the color value 2. The difference is the colors of
inherited from external mapvector by the four parallel functions
(seq f) are no longer 2 but respective 3,4,8,7.

This is in fact a tricky but powerful mechanics. The internal
parallel functions should be conceptually the same and the
programmer shouldn't and also couldn't tell any difference among
them. But in functional programming, one can send a closure like
data, so it provides a way to execute different functions in
parallel nodes. And if the number of parallel virtual nodes
specified equals to the number of elements in a vector, the object
node of certain element sending is predictable. That's why the
colv parameter is provided.

\section{Algorithm}
We specify the colors of physical nodes in the command line and
the colors of virtual nodes in the expression of ocamlp3l program.
But the problem is how to reasonably map the virtual nodes to the
physical nodes according to colors?

If the numbers of virtual nodes correspondingly equal to numbers
of physical nodes with the same color all the time, there would be
no algorithm needed. But it is not always such a case. Maybe you
don't have enough machines for the virtual nodes of miscellaneous
functions from your ocamlp3l expressions, or maybe the ocamlp3l
expressions is a little complex and it is troublesome to adjust it
to adapt the real situation of machines, then you surely need a
algorithm as the one presented here to help you do some optimized
mapping.

To know more about the current algorithm implementation would
surely help the programmer write more efficient code conveniently.

The current algorithm is a mixture of two approaches.

\subsection{Algorithm I}

The first approach is a kind of exact mapping. After the colors
calculated, each of the virtual nodes will be mapped exactly to a
physical node with the same color.

For example, we have virtual nodes a,b,c,d with color 5, nodes
e,f,g with color 3 and nodes h,i with color 0. Then the a,b,c,d
should be mapped to some physical nodes with color 5, nodes e,f,g
should be mapped to some physical nodes with color 3 and nodes h,i
could be mapped to arbitrary nodes and normally decided by some
reasonable algorithm.

Such approach emphasizes that the programmer should know and
arrange color values of both physical nodes and virtual nodes
well. In the example above,if we specify only one physical node
with color 5, then virtual nodes a,b,c,d will all be mapped to
this hard-working node while some physical node with higher
capability such as of color 7 is probably still free. It is
obviously unreasonable. So the precondition here is the programmer
should arrange the color values well. It endows the programmer
high freedom of controlling the mapping but also more complexity
especially when the expression of ocamlp3l functions are very
complex.

\subsection{Algorithm II}

With the example above, we may think of such question --- why the
physical node with color 7 is still free? Then we may naturally
have such idea --- ordered round robin algorithm.

The algorithm firstly sorts the physical nodes with their colors
in a descending order and then do the same thing to the virtual
nodes. The virtual node with largest color, illustrating it needs
the highest capability, has the most priority to select from the
physical nodes. As it is the node needing highest capability, it
is mapped to the physical node with the largest color. Then it is
the the turn of second virtual node. Although the first physical
nodes has the highest capability, but it has been assigned to the
first virtual node which means parts of its power has been
assigned out, so it is reasonable for the second virtual node
selecting the seconde physical node. So do the following nodes. If
all physical nodes have been assigned once, we may approximately
assume it is again the turn of first physical node. Then we find
that it is in fact a algorithm of ordered round robin.

For example, if the colors list of virtual nodes is
[6;6;6;5;3;2;2;1;1;1;...] and the colors list of physical nodes is
[12;6;2;2;1], then the mapping pattern is like this:

\begin{center}
\begin{tabular}{|c|c|}
\hline virtual&physical\\
\hline
6&12\\
6&6\\
6&2\\
5&2\\
3&1\\
2&12\\
2&6\\
1&2\\
1&2\\
1&1\\
...&...\\
\hline
\end{tabular}
\end{center}

It is reasonable that the virtual nodes with higher need to
capability should have more priority to choose the physical nodes
with large color. But such a algorithm also has its defects.

In this algorithm, the equation between color values of the
virtual nodes and physical nodes becomes nonsensical and the
decision factor is in fact substituted by the relative orders with
colors. Such mechanics brings much uncertainty to the programming.
Some virtual node needing a high-capability physical node with
color of 5 will probably arranged to a low-capability node with a
color 2 which is far from the requirement.

An interrelated point is that a physical node is in fact sorted to
the position after the last node after it is assigned once more,
which implies the precondition that a physical node assigned more
times always has lower capability than a physical node assigned
less. It is not the actual case. The capability of machines varies
much from one another. Some multi-processor workstations have
probably times of capability as a common one. Sometimes we prefer
more virtual nodes mapped to several extremely high-capability
physical nodes. But this can not be done in this algorithm.

Supposing if the color value were proportionate with the real
capability of physical or virtual node, this algorithms can be
easily modified a little to be a much reasonable one.

\subsection{Current Algorithm}
Based on the former two, we proposed a mixture one which avoid the
defects of both. The mechanics is quite simple. For the
programmers, only one concept should be remember ---

\emph{A color value specified to a virtual node indicates the
lowest capability of the physical nodes it would like to be mapped
to.}

The concrete algorithm can be described as the following.

First, the virtual nodes list and physical nodes list are sorted
like in Algorithm II. The virtual nodes with larger color values
have more priority to choose.

When it's the turn of a certain virtual node, it means all nodes
with higher ranks have made their choice and no other residual
nodes have more critical requirement to capability then it. So it
can choose freely based on the instance of the time.

It firstly selects all physical nodes with colors larger than or
equal to its own. According to the elementary concept emphasized
above, each of these nodes satisfies its requirement. So in this
special range, the difference on color values selected nodes
becomes unimportant, at least, much less important than the
numbers of virtual nodes to which they've been assigned.

A reasonable strategy adopted here is to select the physical node
which has been assigned to least virtual nodes. If there are more
than one node of them having a same least occupants, it then
choose among them the one with the largest color value.

So do other virtual nodes until all finished. A special case is if
there doesn't exist any physical node that has large enough color
to map, then the one with the largest color is taken instead. But
this should not be a usual case.

An example is shown below. The virtual nodes list (in format (No.
col)) is
[(1,5);(2,5);(3;5);(4,4);(5,4);(6,3);(7,3);(8,3);(9,3);(10,0);(11,0);(12,0)].
The physical nodes list (in format (No. col)) is
[(1,7);(2,5);(3,5);(4,4);(5,3);(6,1);(7,0)]. The mapping result of
current algorithm is like

\begin{center}
\begin{tabular}{|c|c|}
\hline virtual&physical\\
( No. col )&( No. col )\\
 \hline
(1,5)&(1,7)\\
(2,5)&(2,5)\\
(3,5)&(3,5)\\
(4,4)&(4,4)\\
(5,4)&(1,7)\\
(6,3)&(5,3)\\
(7,3)&(2,5)\\
(8,3)&(3,5)\\
(9,3)&(4,4)\\
(10,0)&(6,1)\\
(11,0)&(7,0)\\
(12,0)&(5,3)\\
 \hline
\end{tabular}
\end{center}

In the result, we can see the current algorithm adopted achieves a
good balance and avoid the defects of the former two. If you are
interested, you may simulate the running of the former two and
compare the results. You would find it is of much difference.

\subsection{Hint}

We managed to propose a reasonable algorithm with good effect as
you've seen above. After all, the ``color'' is basically a
qualitative approach, not a quantitative one. So to be highly
accurate or to be highly automatic is unpractical. If more
quantitative parameters being measured or estimated, more
automatical mapping and automatical optimization could be done.
But it is what should be done in the future.

Although this algorithm is provided with some components of
automatic in mapping, we still suggest the programmers do more
estimation on colors as they could. A sensible way is maybe
roughly estimating and adjusting the approximate instance of some
data parallelism functions such as farm, mapvector etc. as well as
some computation critical ones according to the real situation of
physical nodes. As for others, leave them to the algorithm and
``let it be''.

\section{Compatibility}
The color parameters for both physical nodes in the ocamlp3lrun
command line and virtual nodes in the ocamlp3l program expression
are fully optional. So all former ocamlp3l code can still run
without any problem here.

If exploring the current algorithm above, you'll find that in the
cases without any color values specified, the current algorithm
just equals to the round robin algorithm adopted in the former
version. This means if compiling some former code with current
ocamlp3l with color feature, you'll get the a result with absolute
algorithmic equivalence as before. This is an unexpected benefit.


\end{document}