A cluster future is a future that uses cluster evaluation, which means that its value is computed and resolved in parallel in another process.

cluster(
  ...,
  persistent = FALSE,
  workers = availableWorkers(),
  gc = FALSE,
  earlySignal = FALSE,
  maxSizeOfObjects = NULL
)

Arguments

persistent

If FALSE, the evaluation environment is cleared from objects prior to the evaluation of the future.

workers

A cluster object, a character vector of host names, a positive numeric scalar, or a function. If a character vector or a numeric scalar, a cluster object is created using makeClusterPSOCK(workers). If a function, it is called without arguments when the future is created and its value is used to configure the workers. The function should return any of the above types.

gc

If TRUE, the garbage collector run (in the process that evaluated the future) only after the value of the future is collected. Exactly when the values are collected may depend on various factors such as number of free workers and whether earlySignal is TRUE (more frequently) or FALSE (less frequently). Some types of futures ignore this argument.

earlySignal

Specified whether conditions should be signaled as soon as possible or not.

maxSizeOfObjects

The maximum allowed total size, in bytes, of all objects to and from the parallel worker allows. This can help to protect against unexpectedly large data transfers between the parent process and the parallel workers - data that is often transferred over the network, which sometimes also includes the internet. For instance, if you sit at home and have set up a future backend with workers running remotely at your university or company, then you might want to use this protection to avoid transferring giga- or terabytes of data without noticing. (Default: \(500 \cdot 1024^2\) bytes = 500 MiB, unless overridden by a FutureBackend subclass, or by R option future.globals.maxSize (sic!))

...

Additional named elements passed to Future().

Value

A ClusterFuture.

Details

This function is not meant to be called directly. Instead, the typical usages are:

# Evaluate futures via a single background R process on the local machine
plan(cluster, workers = 1)

# Evaluate futures via two background R processes on the local machine
plan(cluster, workers = 2)

# Evaluate futures via a single R process on another machine on on the
# local area network (LAN)
plan(cluster, workers = "raspberry-pi")

# Evaluate futures via a single R process running on a remote machine
plan(cluster, workers = "pi.example.org")

# Evaluate futures via four R processes, one running on the local machine,
# two running on LAN machine 'n1' and one on a remote machine
plan(cluster, workers = c("localhost", "n1", "n1", "pi.example.org"))

Examples

# \donttest{

## Use cluster futures
cl <- parallel::makeCluster(2, timeout = 60)
plan(cluster, workers = cl)

## A global variable
a <- 0

## Create future (explicitly)
f <- future({
  b <- 3
  c <- 2
  a * b * c
})

## A cluster future is evaluated in a separate process.
## Regardless, changing the value of a global variable will
## not affect the result of the future.
a <- 7
print(a)
#> [1] 7

v <- value(f)
print(v)
#> [1] 0
stopifnot(v == 0)

## CLEANUP
parallel::stopCluster(cl)

# }