1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225 | \name{snowfall-init}
\alias{snowfall-init}
\alias{sfInit}
\alias{sfStop}
\alias{sfParallel}
\alias{sfCpus}
\alias{sfNodes}
\alias{sfType}
\alias{sfIsRunning}
\alias{sfSocketHosts}
\alias{sfGetCluster}
\alias{sfSession}
\alias{sfSetMaxCPUs}
\title{Initialisation of cluster usage}
\usage{
sfInit( parallel=NULL, cpus=NULL, type=NULL, socketHosts=NULL, restore=NULL,
slaveOutfile=NULL, nostart=FALSE, useRscript=FALSE )
sfStop( nostop=FALSE )
sfParallel()
sfIsRunning()
sfCpus()
sfNodes()
sfGetCluster()
sfType()
sfSession()
sfSocketHosts()
sfSetMaxCPUs( number=32 )
}
\arguments{
\item{parallel}{Logical determinating parallel or sequential
execution. If not set values from commandline are taken.}
\item{cpus}{Numerical amount of CPUs requested for the cluster. If
not set, values from the commandline are taken.}
\item{nostart}{Logical determinating if the basic cluster setup should
be skipped. Needed for nested use of \pkg{snowfall} and usage in
packages.}
\item{type}{Type of cluster. Can be 'SOCK', 'MPI', 'PVM' or 'NWS'. Default is 'SOCK'.}
\item{socketHosts}{Host list for socket clusters. Only needed for
socketmode (SOCK) and
if using more than one machines (if using only your local machine
(localhost) no list is needed).}
\item{restore}{Globally set the restore behavior in the call
\code{sfClusterApplySR} to the given value.}
\item{slaveOutfile}{Write R slave output to this file. Default: no
output (Unix: \code{/dev/null}, Windows: \code{:nul}). If
using sfCluster this argument has no function, as slave logs are
defined using sfCluster.}
\item{useRscript}{Change startup behavior (snow>0.3 needed): use shell scripts or R-script for startup (R-scripts beeing the new variant, but not working with sfCluster.}
\item{nostop}{Same as noStart for ending.}
\item{number}{Amount of maximum CPUs useable.}
}
\description{
Initialisation and organisation code to use \pkg{snowfall}.
}
\details{
\code{sfInit} initialisise the usage of the \pkg{snowfall} functions
and - if running in parallel mode - setup the cluster and
\pkg{snow}. If using
\code{sfCluster} management tool, call this without arguments. If
\code{sfInit} is called with arguments, these overwrite
\code{sfCluster} settings. If running parallel, \code{sfInit}
set up the
cluster by calling \code{makeCluster} from \pkg{snow}. If using with
\code{sfCluster}, the initialisation also contains management of
lockfiles. If this function is called more than once and current
cluster is yet running, \code{sfStop} is called automatically.
Note that you should call \code{sfInit} before using any other function
from \pkg{snowfall}, with the only exception \code{sfSetMaxCPUs}.
If you do not call \code{sfInit} first, on calling any \pkg{snowfall}
function \code{sfInit} is called without any parameters, which is
equal to sequential mode in \pkg{snowfall} only mode or the settings from
sfCluster if used with sfCluster.
This also means, you cannot check if \code{sfInit} was called from
within your own program, as any call to a function will initialize
again. Therefore the function \code{sfIsRunning} gives you a logical
if a cluster is running. Please note: this will not call \code{sfInit}
and it also returns true if a previous running cluster was stopped via
\code{sfStop} in the meantime.
If you use \pkg{snowfall} in a package argument \code{nostart} is very
handy if mainprogram uses \pkg{snowfall} as well. If set, cluster
setup will be skipped and both parts (package and main program) use
the same cluster.
If you call \code{sfInit} more than one time in a program without
explicit calling \code{sfStop}, stopping of the cluster will be
executed automatically. If your R-environment does not cover required
libraries, \code{sfInit} automatically switches to sequential mode
(with a warning). Required libraries for parallel usage are \pkg{snow}
and depending on argument \code{type} the libraries for the
cluster mode (none for
socket clusters, \pkg{Rmpi} for MPI clusters, \pkg{rpvm} for
PVM clusters and \pkg{nws} for NetWorkSpaces).
If using Socket or NetWorkSpaces, \code{socketHosts} can be used to
specify the hosts you want to have your workers running.
Basically this is a list, where any entry can be a plain character
string with IP or hostname (depending on your DNS settings). Also
for real heterogenous clusters for any host pathes are setable. Please
look to the acccording \pkg{snow} documentation for details.
If you are not giving an socketlist, a list with the required amount
of CPUs on your local machine (localhost) is used. This would be the
easiest way to use parallel computing on a single machine, like a
laptop.
Note there is limit on CPUs used in one program (which can be
configured on package installation). The current limit are 32 CPUs. If
you need a higher amount of CPUs, call \code{sfSetMaxCPUs}
\emph{before} the first call to \code{sfInit}. The limit is set to
prevent inadvertently request by single users affecting the cluster as
a whole.
Use \code{slaveOutfile} to define a file where to write the log
files. The file location must be available on all nodes. Beware of
taking a location on a shared network drive! Under *nix systems, most
likely the directories \code{/tmp} and \code{/var/tmp} are not shared
between the different machines. The default is no output file.
If you are using \code{sfCluster} this
argument have no meaning as the slave logs are always created in a
location of \code{sfClusters} choice (depending on it's configuration).
\code{sfStop} stop cluster. If running in parallel mode, the LAM/MPI
cluster is shut down.
\code{sfParallel}, \code{sfCpus} and \code{sfSession} grant access to
the internal state of the currently used cluster.
All three can be configured via commandline and especially with
\code{sfCluster} as well, but given
arguments in \code{sfInit} always overwrite values on commandline.
The commandline options are \option{--parallel} (empty option. If missing,
sequential mode is forced), \option{--cpus=X} (for nodes, where X is a
numerical value) and \option{--session=X} (with X a string).
\code{sfParallel} returns a
logical if program is running in parallel/cluster-mode or sequential
on a single processor.
\code{sfCpus} returns the size of the cluster in CPUs
(equals the CPUs which are useable). In sequential mode \code{sfCpus}
returns one. \code{sfNodes} is a deprecated similar to \code{sfCpus}.
\code{sfSession} returns a string with the
session-identification. It is mainly important if used with the
\code{sfCluster} tool.
\code{sfGetCluster} gets the \pkg{snow}-cluster handler. Use for
direct calling of \pkg{snow} functions.
\code{sfType} returns the type of the current cluster backend (if
used any). The value can be SOCK, MPI, PVM or NWS for parallel
modes or "- sequential -" for sequential execution.
\code{sfSocketHosts} gives the list with currently used hosts for
socket clusters. Returns empty list if not used in socket mode (means:
\code{sfType() != 'SOCK'}).
\code{sfSetMaxCPUs} enables to set a higher maximum CPU-count for this
program. If you need higher limits, call \code{sfSetMaxCPUs} before
\code{sfInit} with the new maximum amount.
}
\keyword{package}
\seealso{
See snow documentation for details on commands:
\code{link[snow]{snow-cluster}}
}
\examples{
\dontrun{
# Run program in plain sequential mode.
sfInit( parallel=FALSE )
stopifnot( sfParallel() == FALSE )
sfStop()
# Run in parallel mode overwriting probably given values on
# commandline.
# Executes via Socket-cluster with 4 worker processes on
# localhost.
# This is probably the best way to use parallel computing
# on a single machine, like a notebook, if you are not
# using sfCluster.
# Uses Socketcluster (Default) - which can also be stated
# using type="SOCK".
sfInit( parallel=TRUE, cpus=4 )
stopifnot( sfCpus() == 4 )
stopifnot( sfParallel() == TRUE )
sfStop()
# Run parallel mode (socket) with 4 workers on 3 specific machines.
sfInit( parallel=TRUE, cpus=4, type="SOCK",
socketHosts=c( "biom7", "biom7", "biom11", "biom12" ) )
stopifnot( sfCpus() == 4 )
stopifnot( sfParallel() == TRUE )
sfStop()
# Hook into MPI cluster.
# Note: you can use any kind MPI cluster Rmpi supports.
sfInit( parallel=TRUE, cpus=4, type="MPI" )
sfStop()
# Hook into PVM cluster.
sfInit( parallel=TRUE, cpus=4, type="PVM" )
sfStop()
# Run in sfCluster-mode: settings are taken from commandline:
# Runmode (sequential or parallel), amount of nodes and hosts which
# are used.
sfInit()
# Session-ID from sfCluster (or XXXXXXXX as default)
session <- sfSession()
# Calling a snow function: cluster handler needed.
parLapply( sfGetCluster(), 1:10, exp )
# Same using snowfall wrapper, no handler needed.
sfLapply( 1:10, exp )
sfStop()
}
}
|