\documentclass [a4paper] {article} \title{Simple Notes on ``Color'' for ocamlp3l} \author{Roberto DI COSMO \and LI Zheng} \begin{document} \maketitle Recently, we added a small new feature ``color'' to ocamlp3l. It provides a simple way of specifying both the relative computing capability of a physical node and the capability needed by a virtual node. So more customizable mapping between the physical nodes and virtual nodes could be done by the programmer of in this way. \section{Concept} A ``color'' is in fact a integer that presents relative computing capability. A larger ``color'' presents higher capability of computing. So a ``color'' can also be considered as something like ``rank'' or ``weight''. It can used both with a physical node denoting the relative capability it holds and with a virtual node denoting the relative capability it needs. For the virtual node, if there is physical node with larger color value available, it will also be OK and obviously better. So the color for a virtual node denotes naturally the lowest capability it need. If not specified with color values, a physical node or a virtual node is automatical assigned a color value 0. For a physical node, the default value 0 means lowest rank of capability. While for a virtual node, the default value 0 means the required physical node should equal to or higher than lowest rank 0 which in fact means ``do not care'' or say ``map to any physical node is OK ''. One thing should be emphasized here is that the color doesn't have any proportionate relation to the real computing capability. It only means ``high or low'' not ``how much''. So the color now is only a simple qualitative approach, not a quantitative approach in which more automation and optimization could be done. \section{Syntax and Informal Semantics} \subsection{Color for Physical Nodes} We specify the colors with the physical nodes in the command of ocamlp3l executable file like: \begin{figure}[h] xxx.par\ \ -p3lroot\ \ ip1:port1\#col1\ \ ip2:port2\#col2\ ...\ ipn:portn\#coln \end{figure} As you my see above, we add a small change to the format of each physical node. The parameter of color (col) which has a prefix separator \# is added to the end of each physical node, obviously denoting the relative capability of this physical node. Both port and color are optional arguments and have their respective default values. A actual example is shown below: \begin{figure}[h] xxx.par\ \ -p3lroot\ \ 134.157.168.68:2011\#6\ \ 134.157.168.22:223\ \ 134.157.168.4\#3 134.157.168.7\ \ pps.jussieu.fr:76\#5\ \ aa.pps.jussieu.fr:678\ \ bb.pps.jussieu.fr\#2 cc.pps.jussieu.fr \end{figure} \subsection{Color for Virtual Nodes} We specify the colors with the virtual nodes in the ocamlp3l code in this way \begin{figure}[h] \centering \begin{tabular}{|l|} \hline seq $\sim$col:number (f,n)\\ farm $\sim$col:number $\sim$colv:[number1;number2;...number n] (f,n)\\ loop $\sim$col:number (c,f)\\ mapvector $\sim$col:number $\sim$colv:[number1;number2;...number n] (f,n)\\ reducevector $\sim$col:number $\sim$colv:[number1;number2;...number n] (f,n)\\ \hline \end{tabular} \end{figure} For there is no virtual nodes or physical nodes produced by pipe skeleton, there is no need to add color syntax for it. The symbol $\sim$ is used in OCaml from version 3.04 to present optional arguments. All color parameters are optional. This provide good seamless compatibility to former source code. So what should the semantics of col be like? Should they be specified to only this current layer or the whole internal structure? As we know the skeleton functions farm, loop, mapvector etc. present in fact a structure of parallelism. For an example, the farm(f,n) is composed of a farmemitter, a farmcollector and n composite structures of f. So if the color parameters have effect only on the current layer which are only the farmemitter and farmcollector indeed. That's obviously much complex in writting and not the semantics we really want, although it is still doable. In many many cases, we don't even care the capability of such control node like farmemitter or farmcollector. It is of the same problem if the colors are applied only to the internal layers such as the f in farm(f,n). Under such supposition, the colors of farm are in fact applied to f while the colors of control nodes of farm itself are specified by upper layer. It unnecessarily adds much complexity and confusion to programming. Considering the format in which the color is written, we can find it is more nature to assume that a color is applied to the whole structure of the skeleton function it belongs to. In such a way, the color of farm(f,n) is applied not only the farmemitter, farmcollector but also the n functions f. For the f could also be some composite skeleton function, the effect of this color will be recursively applied to the internal of f. In common sense, we also assume that the local colors have more precedence then the external ones. For instance, in \begin{figure}[h] farm $\sim$col:2 ( (seq f)$|||$(seq $\sim$col:6 g), 5 ) \end{figure} The color of farm acts on the farmemitter, farmcollector and also recursively descend to the internal functions. Not specified local color, the function (seq f) will inherit the external color value 2 from farm. Specified local color, the (seq $\sim$col:6 g) takes the value of local color 6 instead of inherited color 2. Some skeleton functions are endowed with the color list colv. It provide a way of assigning different colors to internal parallel structures. For example, \begin{figure}[h] mapvector $\sim$col:2 $\sim$colv:[3;4;8;7] (seq f, 4) \end{figure} The parameter col acts as before. The mapemitter and mapcollector are assigned the color value 2. The difference is the colors of inherited from external mapvector by the four parallel functions (seq f) are no longer 2 but respective 3,4,8,7. This is in fact a tricky but powerful mechanics. The internal parallel functions should be conceptually the same and the programmer shouldn't and also couldn't tell any difference among them. But in functional programming, one can send a closure like data, so it provides a way to execute different functions in parallel nodes. And if the number of parallel virtual nodes specified equals to the number of elements in a vector, the object node of certain element sending is predictable. That's why the colv parameter is provided. \section{Algorithm} We specify the colors of physical nodes in the command line and the colors of virtual nodes in the expression of ocamlp3l program. But the problem is how to reasonably map the virtual nodes to the physical nodes according to colors? If the numbers of virtual nodes correspondingly equal to numbers of physical nodes with the same color all the time, there would be no algorithm needed. But it is not always such a case. Maybe you don't have enough machines for the virtual nodes of miscellaneous functions from your ocamlp3l expressions, or maybe the ocamlp3l expressions is a little complex and it is troublesome to adjust it to adapt the real situation of machines, then you surely need a algorithm as the one presented here to help you do some optimized mapping. To know more about the current algorithm implementation would surely help the programmer write more efficient code conveniently. The current algorithm is a mixture of two approaches. \subsection{Algorithm I} The first approach is a kind of exact mapping. After the colors calculated, each of the virtual nodes will be mapped exactly to a physical node with the same color. For example, we have virtual nodes a,b,c,d with color 5, nodes e,f,g with color 3 and nodes h,i with color 0. Then the a,b,c,d should be mapped to some physical nodes with color 5, nodes e,f,g should be mapped to some physical nodes with color 3 and nodes h,i could be mapped to arbitrary nodes and normally decided by some reasonable algorithm. Such approach emphasizes that the programmer should know and arrange color values of both physical nodes and virtual nodes well. In the example above,if we specify only one physical node with color 5, then virtual nodes a,b,c,d will all be mapped to this hard-working node while some physical node with higher capability such as of color 7 is probably still free. It is obviously unreasonable. So the precondition here is the programmer should arrange the color values well. It endows the programmer high freedom of controlling the mapping but also more complexity especially when the expression of ocamlp3l functions are very complex. \subsection{Algorithm II} With the example above, we may think of such question --- why the physical node with color 7 is still free? Then we may naturally have such idea --- ordered round robin algorithm. The algorithm firstly sorts the physical nodes with their colors in a descending order and then do the same thing to the virtual nodes. The virtual node with largest color, illustrating it needs the highest capability, has the most priority to select from the physical nodes. As it is the node needing highest capability, it is mapped to the physical node with the largest color. Then it is the the turn of second virtual node. Although the first physical nodes has the highest capability, but it has been assigned to the first virtual node which means parts of its power has been assigned out, so it is reasonable for the second virtual node selecting the seconde physical node. So do the following nodes. If all physical nodes have been assigned once, we may approximately assume it is again the turn of first physical node. Then we find that it is in fact a algorithm of ordered round robin. For example, if the colors list of virtual nodes is [6;6;6;5;3;2;2;1;1;1;...] and the colors list of physical nodes is [12;6;2;2;1], then the mapping pattern is like this: \begin{center} \begin{tabular}{|c|c|} \hline virtual&physical\\ \hline 6&12\\ 6&6\\ 6&2\\ 5&2\\ 3&1\\ 2&12\\ 2&6\\ 1&2\\ 1&2\\ 1&1\\ ...&...\\ \hline \end{tabular} \end{center} It is reasonable that the virtual nodes with higher need to capability should have more priority to choose the physical nodes with large color. But such a algorithm also has its defects. In this algorithm, the equation between color values of the virtual nodes and physical nodes becomes nonsensical and the decision factor is in fact substituted by the relative orders with colors. Such mechanics brings much uncertainty to the programming. Some virtual node needing a high-capability physical node with color of 5 will probably arranged to a low-capability node with a color 2 which is far from the requirement. An interrelated point is that a physical node is in fact sorted to the position after the last node after it is assigned once more, which implies the precondition that a physical node assigned more times always has lower capability than a physical node assigned less. It is not the actual case. The capability of machines varies much from one another. Some multi-processor workstations have probably times of capability as a common one. Sometimes we prefer more virtual nodes mapped to several extremely high-capability physical nodes. But this can not be done in this algorithm. Supposing if the color value were proportionate with the real capability of physical or virtual node, this algorithms can be easily modified a little to be a much reasonable one. \subsection{Current Algorithm} Based on the former two, we proposed a mixture one which avoid the defects of both. The mechanics is quite simple. For the programmers, only one concept should be remember --- \emph{A color value specified to a virtual node indicates the lowest capability of the physical nodes it would like to be mapped to.} The concrete algorithm can be described as the following. First, the virtual nodes list and physical nodes list are sorted like in Algorithm II. The virtual nodes with larger color values have more priority to choose. When it's the turn of a certain virtual node, it means all nodes with higher ranks have made their choice and no other residual nodes have more critical requirement to capability then it. So it can choose freely based on the instance of the time. It firstly selects all physical nodes with colors larger than or equal to its own. According to the elementary concept emphasized above, each of these nodes satisfies its requirement. So in this special range, the difference on color values selected nodes becomes unimportant, at least, much less important than the numbers of virtual nodes to which they've been assigned. A reasonable strategy adopted here is to select the physical node which has been assigned to least virtual nodes. If there are more than one node of them having a same least occupants, it then choose among them the one with the largest color value. So do other virtual nodes until all finished. A special case is if there doesn't exist any physical node that has large enough color to map, then the one with the largest color is taken instead. But this should not be a usual case. An example is shown below. The virtual nodes list (in format (No. col)) is [(1,5);(2,5);(3;5);(4,4);(5,4);(6,3);(7,3);(8,3);(9,3);(10,0);(11,0);(12,0)]. The physical nodes list (in format (No. col)) is [(1,7);(2,5);(3,5);(4,4);(5,3);(6,1);(7,0)]. The mapping result of current algorithm is like \begin{center} \begin{tabular}{|c|c|} \hline virtual&physical\\ ( No. col )&( No. col )\\ \hline (1,5)&(1,7)\\ (2,5)&(2,5)\\ (3,5)&(3,5)\\ (4,4)&(4,4)\\ (5,4)&(1,7)\\ (6,3)&(5,3)\\ (7,3)&(2,5)\\ (8,3)&(3,5)\\ (9,3)&(4,4)\\ (10,0)&(6,1)\\ (11,0)&(7,0)\\ (12,0)&(5,3)\\ \hline \end{tabular} \end{center} In the result, we can see the current algorithm adopted achieves a good balance and avoid the defects of the former two. If you are interested, you may simulate the running of the former two and compare the results. You would find it is of much difference. \subsection{Hint} We managed to propose a reasonable algorithm with good effect as you've seen above. After all, the ``color'' is basically a qualitative approach, not a quantitative one. So to be highly accurate or to be highly automatic is unpractical. If more quantitative parameters being measured or estimated, more automatical mapping and automatical optimization could be done. But it is what should be done in the future. Although this algorithm is provided with some components of automatic in mapping, we still suggest the programmers do more estimation on colors as they could. A sensible way is maybe roughly estimating and adjusting the approximate instance of some data parallelism functions such as farm, mapvector etc. as well as some computation critical ones according to the real situation of physical nodes. As for others, leave them to the algorithm and ``let it be''. \section{Compatibility} The color parameters for both physical nodes in the ocamlp3lrun command line and virtual nodes in the ocamlp3l program expression are fully optional. So all former ocamlp3l code can still run without any problem here. If exploring the current algorithm above, you'll find that in the cases without any color values specified, the current algorithm just equals to the round robin algorithm adopted in the former version. This means if compiling some former code with current ocamlp3l with color feature, you'll get the a result with absolute algorithmic equivalence as before. This is an unexpected benefit. \end{document}