Raw File
Tip revision: 9552333b71b280bfb9fc5f61f43e6fa6dffa9022 authored by Jochen Knaus on 03 October 2008, 00:00 UTC
version 1.60
Tip revision: 9552333


\title{Initialisation of cluster usage}
sfInit( parallel=NULL, cpus=NULL, type=NULL, socketHosts=NULL, nostart=FALSE )
sfStop( nostop=FALSE )

sfSetMaxCPUs( number=32 )
  \item{parallel}{Logical determinating parallel or sequential
    execution. If not set values from commandline are taken.}
  \item{cpus}{Numerical amount of CPUs requested for the cluster. If
    not set, values from the commandline are taken.}
  \item{nostart}{Logical determinating if the basic cluster setup should
    be skipped. Needed for nested use of \pkg{snowfall} and usage in
  \item{type}{Type of cluster. Can be 'SOCK', 'MPI', 'PVM' or 'NWS'. Default is 'SOCK'.}
  \item{socketHosts}{Host list for socket clusters. Only needed for
    socketmode (SOCK) and
    if using more than one machines (if using only your local machine
    (localhost) no list is needed).}
  \item{nostop}{Same as noStart for ending.}
  \item{number}{Amount of maximum CPUs useable.}
  Initialisation and organisation code to use \pkg{snowfall}.
  \code{sfInit} initialisise the usage of the \pkg{snowfall} functions
  and - if running in parallel mode - setup the cluster and
  \pkg{snow}. If using
  \code{sfCluster} management tool, call this without arguments. If
  \code{sfInit} is called with arguments, these overwrite
  \code{sfCluster} settings. If running parallel, \code{sfInit}
  set up the
  cluster by calling \code{makeCluster} from \pkg{snow}. If using with
  \code{sfCluster}, the initialisation also contains management of
  lockfiles. If this function is called more than once and current
  cluster is yet running, \code{sfStop} is called automatically.

  Note that you should call \code{sfInit} before using any other function
  from \pkg{snowfall}, with the only exception \code{sfSetMaxCPUs}.
  If you don't call \code{sfInit} first, on calling an \pkg{snowfall}
  function \code{sfInit} is called without any parameters, which is
  equal to sequential mode in \pkg{snowfall} only mode or the settings from
  sfCluster if used with sfCluster.

  If you use \pkg{snowfall} in a package argument \code{nostart} is very
  handy if mainprogram uses \pkg{snowfall} as well. If set, cluster
  setup will be skipped and both parts (package and main program) use
  the same cluster.

  If you call \code{sfInit} more than one time in a program without
  explicit calling \code{sfStop}, stopping of the cluster will be
  executed automatically. If your R-environment does not cover required
  libraries, \code{sfInit} automatically switches to sequential mode
  (with a warning). Required libraries for parallel usage are \pkg{snow}
  and depending on argument \code{type} the libraries for the
  cluster mode (none for
  socket clusters, \pkg{Rmpi} for MPI clusters, \pkg{rpvm} for
  PVM clusters and \pkg{nws} for NetWorkSpaces).

  If using Socket or NetWorkSpaces, \code{socketHosts} can be used to
  specify the hosts you want to have your workers running.
  Basically this is a list, where any entry can be a plain character
  string with IP or hostname (depending on your DNS settings). Also
  for real heterogenous clusters for any host pathes are setable. Please
  look to the acccording \pkg{snow} documentation for details.
  If you are not giving an socketlist, a list with the required amount
  of CPUs on your local machine (localhost) is used. This would be the
  easiest way to use parallel computing on a single machine, like a

  Note there is limit on CPUs used in one program (which can be
  configured on package installation). The current limit are 32 CPUs. If
  you need a higher amount of CPUs, call \code{sfSetMaxCPUs}
  \emph{before} the first call to \code{sfInit}. The limit is set to
  prevent inadvertently request by single users affecting the cluster as
  a whole. 

  \code{sfStop} stop cluster. If running in parallel mode, the LAM/MPI
  cluster is shut down.
  \code{sfParallel}, \code{sfCpus} and \code{sfSession} grant access to
  the internal state of the currently used cluster.
  All three can be configured via commandline and especially with
  \code{sfCluster} as well, but given
  arguments in \code{sfInit} always overwrite values on commandline.
  The commandline options are \option{--parallel} (empty option. If missing,
  sequential mode is forced), \option{--cpus=X} (for nodes, where X is a
  numerical value) and \option{--session=X} (with X a string).

  \code{sfParallel} returns a
  logical if program is running in parallel/cluster-mode or sequential
  on a single processor.

  \code{sfCpus} returns the size of the cluster in CPUs
  (equals the CPUs which are useable). In sequential mode \code{sfCpus}
  returns one. \code{sfNodes} is a deprecated similar to \code{sfCpus}.

  \code{sfSession} returns a string with the
  session-identification. It is mainly important if used with the
  \code{sfCluster} tool.
  \code{sfGetCluster} gets the \pkg{snow}-cluster handler. Use for
  direct calling of \pkg{snow} functions.

  \code{sfType} returns the type of the current cluster backend (if
  used any). The value can be SOCK, MPI, PVM or NWS for parallel
  modes or "- sequential -" for sequential execution.

  \code{sfSocketHosts} gives the list with currently used hosts for
  socket clusters. Returns empty list if not used in socket mode (means:
  \code{sfType() != 'SOCK'}).
  \code{sfSetMaxCPUs} enables to set a higher maximum CPU-count for this
  program. If you need higher limits, call \code{sfSetMaxCPUs} before
  \code{sfInit} with the new maximum amount.
See snow documentation for details on commands:
  # Run program in plain sequential mode.
  sfInit( parallel=FALSE )
  stopifnot( sfParallel() == FALSE )

  # Run in parallel mode overwriting probably given values on
  # commandline.
  # Executes via Socket-cluster with 4 worker processes on
  # localhost.
  # This is probably the best way to use parallel computing
  # on a single machine, like a notebook, if you are not
  # using sfCluster.
  # Uses Socketcluster (Default) - which can also be stated
  # using type="SOCK".
  sfInit( parallel=TRUE, nodes=4 )
  stopifnot( sfCpus() == 4 )
  stopifnot( sfParallel() == TRUE )

  # Run parallel mode (socket) with 4 workers on 3 specific machines.
  sfInit( parallel=TRUE, nodes=4, type="SOCK",
          socketHosts=c( "biom7", "biom7", "biom11", "biom12" ) )
  stopifnot( sfCpus() == 4 )
  stopifnot( sfParallel() == TRUE )

  # Hook into MPI cluster.
  # Note: you can use any kind MPI cluster Rmpi supports.
  sfInit( parallel=TRUE, nodes=4, type="MPI" )

  # Hook into PVM cluster.
  sfInit( parallel=TRUE, nodes=4, type="PVM" )

  # Run in sfCluster-mode: settings are taken from commandline:
  # Runmode (sequential or parallel), amount of nodes and hosts which
  # are used.

  # Session-ID from sfCluster (or XXXXXXXX as default)
  session <- sfSession()

  # Calling a snow function: cluster handler needed.
  parLapply( sfGetCluster(), 1:10, exp )

  # Same using snowfall wrapper, no handler needed.
  sfLapply( 1:10, exp )

back to top