##### https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git
color_guide.tex
\documentclass{article}
\usepackage{a4wide}
\title{Introduction to color}
\author{Zheng LI}
\begin{document}
\maketitle

In order to specific mapping strategy between virtual processors and physical processors. The concept of \emph{color}, which intend to present the power of processors, is developed. With the color feature, we can map heavy tasks to more powerful or more disengagement machines and keep lighter weight tasks to weaker ones.

\section{syntax}

The color parameters are designed for both virtual processors and physical processors. For a virtual processor, its color value means the power it needs; for a physical processor, its color value means the power it has. The bigger the value is, the more powerful it presents.

\subsection{virtual side}

We specify virtual processors' colors in OcamlP3l programs, more precisely, in the OcamlP3l skeleton expressions. The color parameter is provided with every OcamlP3l skeleton, presenting the power needed for current skeleton.

\begin{figure}[h]
\centering
\begin{tabular}{|l|}
\hline
seq  $\sim$col:v (insidefunction)\\
farm $\sim$col:v $\sim$colv:[v1;v2;...vn] (insideskeleton,n)\\
loop $\sim$col:v (c,insideskeleton)\\
mapvector $\sim$col:v $\sim$colv:[v1;v2;...vn] (insideskeleton,n)\\
reducevector $\sim$col:v $\sim$colv:[v1;v2;...vn] (insideskeleton,n)\\
\hline
\end{tabular}
\end{figure}

Note that:

\begin{itemize}

\item Colors are positive integers. Do not use negatives and 0, they are preserved.

\item All color parameter is optional. Specified with color argument, a skeleton intends to be mapped to physical processors of corresponding power; not specified with color, a skeleton doesn't have special request on mapping (which means it doesn't care which physical processor to be mapped to, i.e. less privilege when mapping).

\item The working range of a specific color is the whole skeleton to which it is affiliated. Every inside skeleton is therefore recursively applied with this color, except a local color is specified which should has more privilege and overlaid the inherent one. So a skeleton without a specific color in syntax doesn't means it is really unspecific on color, because it will follow the default color value of its context, unless the default value of its context is also unspecified.

\item Except the \emph{col} parameter, the \emph{colv} (a color vector of integer list) is especially designed for \emph{farm}, \emph{mapvector} and \emph{reducevector} skeletons. The \emph{colv} provides a way of assigning different colors to the parallel structures inside.
\begin{itemize}
\item If both specified, \emph{colv} has more privilege than \emph{col} on inside skeletons. For example, farm $\sim$col:2 $\sim$colv:[4;5;6;7] (seq(f), 4) corresponds to a skeletal structure with both emitter and collector of color 2 together with four parallel worker separately belonging to color 4,5,6 and 7.
\item If the number of items inside \emph{colv} list is less than n, it means only parts of the inside structures need specific color value. So the remainders will just take the default color values from their context like the emitter and collector.For instance, farm $\sim$col:2 $\sim$colv:[4;5;6;7] (seq(f), 6) corresponds to a skeletal structure with both emitter and collector of color 2 together with six parallel worker separately belonging to color 4,5,6,7,2 and 2.
\end{itemize}
\end{itemize}

\subsection{physical side}
We used to provide the information of physical processors (name or IP, port) in the command line that invokes the parallel execution on the  root node. So now we just add the information of color to each physical processor.

\begin{figure}[h]
\centering
\begin{tabular}{|l|}
\hline
P3lProg.par\ -p3lroot\ -otheroptions\ ip:port\#col\ name:port\#col ...\\
\hline
\end{tabular}
\end{figure}

Note that:
\begin{itemize}
\item Colors are positive integers. Do not use 0 or negatives, they are preserved.
\item Just as port, color is also optional parameter. The physical nodes without specific color will take the default color value 0, which is always less than any specific color value.
\end{itemize}

\section{mapping}

There are two modes of mapping the virtual processors to physical processors --- strict mapping and non-strict mapping. If we launch the root node with -strict'' option, we will use the strict mapping mode, otherwise the non-strict mode is exploited.

\subsection{strict mapping}
The basic concept of strict mapping is --- for all the virtual processors in a OcamlP3L program, we must provide a list of physical processors which could establish a one-to-one correspondence based on equivalent color values. For example, if we have respectly $n_1, n_2,...,n_i,...$ virtual processors of $color_1, color_2,...color_i,...$ in the OcamlP3L program, we must also provide exactly $n_1, n_2, ..., n_i, ...$ physical processors of the same colors when launching it.

Note that,

\begin{itemize}

\item The virtual processors list generated and physical processors list provided should have not only the same length but also the same distribution on colors. So each virtual processor should be able to \textbf{exclusively} map to a physical processor of the same color, and vice versa. If the configurations for virtual side and physical side do not agree, it is programmer's duty to adjust it manually.

\item When mapping, each items in the physical processors list is a basic object, which can be mapped only once.

\item It is quite possible that we need to map several virtual processors (with the same color or not) to a single physical processors, for we could have less physical processors than virtual processors or some physical processor is quite powerful to undertake several tasks in parallel etc. In such cases, we should write the same physical processor for several times in the input list, so they will be considered as several different items (with the same or different colors as needed) when doing mapping, while still work as one physical sever when doing computation.

\end{itemize}

The strict mapping mode requires that the programmer specifics everything about colors and mapping, which is quite complex but also endows the programmer with total control. In the extreme case, the programmer could provides each virtual processor with exclusively different color values and meanwhile do the same with the physical processors, so that the mapping scheme is totally decided.

To counter the complexity of counting colors in strict mapping mode, two new features are added to the graphic semantics.
\begin{itemize}
\item A OcamlP3L program compiled with graphic semantics will show the colors of all virtual nodes in the pictures. A virtual processor of color $n$ is denoted as $Cn$, while a virtual processor without specific color keeps unchanged.
\item A text report on the distribution of color values will be output on screen each time when we launch a OcamlP3L program compiled with graphic semantics. It would be very helpful for the programmer to count and adjust color values for both virtual and physical processors even before the program being compiled and executed in parallel mode.
\end{itemize}

Even in strict mode, there are still some ambiguous factors the programmers should know.
\begin{itemize}

\item Some virtual processors do not have specific colors, which means they do not have special request to the physical processors to be mapped on. So we could have two choice:
\begin{itemize}
\item allow them to be exclusively mapped to any available physical processors (since they alway have a lower priority, they won't affect the mapping of any processors with specific colors), or
\item take a stricter limitation that a virtual processor without specific color must be mapped to a physical processor without a specific color either
\end{itemize}

\item It is possible that the physical processors provided can not only satisfy all virtual processors but also have one or more remainders. So we could have two choice:
\begin{itemize}
\item automaticly ignore the part unneeded, or
\item take a stricter limitation : give information of mistake
\end{itemize}

\end{itemize}

We think either option is reasonable. But we currently choose the more stricter options for both questions to keep a sense of purity. The other scheme has also been tested with just a few lines modification and can be provided on demand easily.

\subsection{non-strict mapping}

The distinguish characters between non-strict mapping and strict mapping:

\begin{itemize}
\item We do not need to provide a physical processors list which has the exact correspondence to the virtual processors no matter on the processors' amount or colors distribution. The algorithm will try to do a optimized mapping automatically.

\item Unlike in the strict mapping mode in which we need to write a same physical processors as $n$ items if it intends to be mapped to $n$ virtual processors, we needn't do like this in non-strict mapping. In non-strict mode, each physical processor in the input list can be mapped for once or several times. So do not write a physical processor as more items unless you really want it be considered so.

\item The algorithm automatically choose to which physical processor a virtual processor should be mapped, according to not only the power of physical processors (specified through color values) but also the already assigned loads on it (how many virtual processors have been mapped to this machine).

\end{itemize}

We won't talk details about the mapping algorithm in this articles. We suggest to use non-strict mapping in the following way --- give corresponding color values between virtual processors of crucial tasks and physical processors of strong powers, so that they can be better fitted, and left other trivial ones unspecified.

\section{capacity}

A new parameter about capacity is added for physical processors. It basically gives a restriction on how many virtual processors could be mapped to current physical processors. Just like port and color parameters, it's also optional. The syntax is

\begin{figure}[h]
\centering
\begin{tabular}{|l|}
\hline
P3lProg.par\ -p3lroot\ -otheroptions\ ip:port\#col\%cap name:port\#col\%cap ...\\
\hline
\end{tabular}
\end{figure}

\begin{itemize}
\item In the strict mapping mode, the capacity parameter decrease the complexity of write a same physical processors as several items in the input list. For example, to map all 50 worker processors of OcamlP3L expression ( farm $\sim$col:2 (seq $\sim$col:6 (f) , 50) ) to one extremely powerful physical server, we could either write

\begin{center}
P3lProg.par -p3lroot -strict ip1\#2 ip2\#2 $\underbrace{ip3\#6\ ip3\#6 ... ip3\#6}_{50}$
\end{center}

or simply

\begin{center}
P3lProg.par -p3lroot -strict\ ip1\#2\ ip2\#2\ ip3\#6\%50
\end{center}

They are absolutely the same, but the latter looks obviously much cleaner. It's very nature that a physical processor without specific capacity argument presents just one item as default.

\item In the non-strict mapping mode, a capacity argument specifies the upper bound of the number of virtual processors that can be mapped to current physical processor. This provides more expressiveness than before. We will see it clearly in the section \ref{sec:example}. A physical processor without specific capacity argument has no such limitation.
\end{itemize}

\section{examples}\label{sec:example}
Compared with former implementation, the strict mapping mode and the capacity parameter are two main features added. They adds obviously stronger ability to specify the mapping mechanics. We'll illustrate the usage of the new features by two simple yet very clear examples, through which we'll see that there are some cases which are very difficult for former version but can be easily handle with the new features. And more important, the strict mapping mode indeed provides an ultimate solution for the programmers to exactly specify what they want.

\subsection{example 1}
Suppose we have the OcamlP3L expression like farm (seq $\sim$col:10 f, 2) $|||$ farm (seq $\sim$col:2 g, 100) in which function $f$ is very heavy computation and $g$ is very very slight task. Meanwhile we have two most powerful machines (P1 and P2) which intend to work \emph{exclusively} for the two $f$-workers, two weaker machines (W1 and W2) to take all the 100 slight $g$-worker tasks and one even weaker machine (W) for all other work such as emitter and collect etc. This is quite difficult to do with previous non-strict mapping algorithms, because we are lack of ways to prevent slight tasks being more or less mapped to the powerful machine. (The color value presents only the relative priority and do not have any numerical meanings, while the number of slight tasks seems too big ...)

But now we have two ways to do that:
\begin{itemize}
\item In the strict mapping mode, we can just provides the physical processors list as

\begin{center}
P3lProg.par -p3lroot -strict P1\#10 P2\#10 $\underbrace{W1\#2 ... W1\#2}_{50}$ $\underbrace{W2\#2 ... W2\#2}_{50}$ W W W W
\end{center}

or simply

\begin{center}
P3lProg.par -p3lroot -strict P1\#10 P2\#10 W1\#2\%50 W2\#2\%50 W\%4
\end{center}

\item In the non-strict mapping mode, new parameter capacity can also help us with the restriction of the exclusive mapping

\begin{center}
P3lProg.par -p3lroot P1\#10\%1 P2\#10\%1 W1\#2\%50 W2\#2\%50 W
\end{center}

\end{itemize}

\subsection{example 2}
Another even simpler example, farm (seq $\sim$col:6 f,50). Suppose we have two extremely strong physical processor to do the 50 workers job and a weak processor(W) to do the rest trivial jobs. The problem is that, although both very powerful, P1 is still faster than P2. If we assign equivalent tasks to them, P2 will finish a little later which is not that economic; while if we assign 30 tasks to P1 and 20 tasks to P2, they will approximately finish in the same time.

In previous implementation, it is difficult for us to control exactly the amounts and proportion of tasks mapped to different physical processors. We can manage to do this by some trick --- rewrite it to
\begin{center}
farm $\sim$colv:[$\underbrace{7;...;7}_{30}$;$\underbrace{5;...;5}_{20}$] (seq f, 50)
\end{center}
and assign P1 with color 7 P2 with color 5, but it doesn't seems that natural, for equivalent tasks has to be unreasonably classified into two categories.

But now we can easily defined it with both strict and non-strict mode

\begin{itemize}

\item Strict mode

\begin{center}
P3lProg.par -p3lroot -strict $\underbrace{P1\#6 ... P1\#6}_{30}$ $\underbrace{P2\#6 ... P2\#6}_{20}$ W W
\end{center}

or simply

\begin{center}
P3lProg.par -p3lroot -strict P1\#6\%30 P2\#6\%20 W\%2
\end{center}

\item Non-strict mode

\begin{center}
P3lProg.par -p3lroot P1\#6\%30 P2\#6\%20 W
\end{center}

\end{itemize}

\section{debug}
Detailed information about mapping is provided in debug mode. Just lunch a parallel OcamlP3L program with a specific physical processors configuration, if -debug'' is enabled, you can find the exact mapping sequence and result.

\end{document}