1 Why memshare?

Most parallel R workflows duplicate large objects into every worker process. That wastes RAM and time. memshare stores big objects once in shared memory and lets workers attach to them as ordinary R vectors/matrices via ALTREP views. You get:

  • minimal memory use (one in-RAM copy),
  • no serialization of big objects to workers,
  • drop‑in apply/lapply-style APIs that manage sharing for you.

This vignette is a quick, practical guide, for technical details we refer to [Thrun and Märte, 2025]


2 Install

install.packages("memshare")         # CRAN
# remotes::install_github("yourname/memshare")  # dev

Requirements: R ≥ 4.0, C++17 toolchain.


3 5‑minute tour

3.1 1) Column-wise work on a matrix (memApply)

library(memshare)

set.seed(1)
n <- 10000; p <- 2000
X <- matrix(rnorm(n * p), n, p)   # numeric/double matrix
y <- rnorm(n)

# Correlate each column with y, in parallel, without copying X to workers
res <- memApply(
  X = X, MARGIN = 2,
  FUN = function(v, y) cor(v, y),
  VARS = list(y = y)           # shared side data
)
str(res)

What happened?
X and y were placed in shared memory; workers received views (ALTREP) instead of copies. Each worker extracted the i-th column as v, ran FUN(v, y), and returned a result. All views were released automatically at the end.

3.2 2) List workloads (memLapply)

list_length <- 1000
d <- 200
L <- lapply(1:list_length, function(i) matrix(rnorm(d * d), d, d))
w <- rnorm(d)

ans <- memLapply(L, function(el, w) el %*% w, VARS = list(w = w))
length(ans); dim(ans[[1]])

3.3 3) Low-level control (register / retrieve / release)

ns <- "demo"
X  <- matrix(rnorm(1e6), 1000, 1000)
registerVariables(ns, list(X = X))

vw <- retrieveViews(ns, "X")
mean(vw$X[ , 1])
releaseViews(ns, "X")

releaseVariables(ns, "X")

4 Concepts that matter

  • Namespace: a string key that identifies a shared-memory context (e.g., "demo").
  • Pages: the actual shared-memory buffers owned by a session.
  • Views: ALTREP wrappers that let R treat shared-memory buffers like normal objects.

Unload the package (or release views/variables) to clean up. Memory is freed once no views remain.


5 Common patterns

5.1 Feature map over columns (fast and memory-light)

score <- function(v, a, b) sum((v - a)^2) / (1 + b)  # any column-wise work
ns <- "scores"
a <- rnorm(n); b <- runif(1)

out <- memApply(X = X, MARGIN = 2, FUN = score, VARS = list(a = a, b = b), NAMESPACE = ns)

5.2 Multiple passes on the same data

Reuse the same namespace to avoid re-registering large objects.

ns <- "reuse"
registerVariables(ns, list(X = X))
pass1 <- memApply("X", 2, function(v) sd(v), NAMESPACE = ns)
pass2 <- memApply("X", 2, function(v) mean(v), NAMESPACE = ns)
releaseVariables(ns, "X")

6 Tips and best practices

  • FUN’s first argument must be the vector/list element (v for memApply, el for memLapply).
    Any extra shared variables in VARS must use exactly the same names in FUN’s signature.
  • Matrices/vectors must be basic numeric (double) without S3 class attributes (ALTREP expects raw storage).
  • If you provide your own cluster, you can still use clusterExport for small copied objects; big ones belong in VARS.
  • Free memory promptly: releaseViews() in workers (handled automatically by memApply/memLapply), and releaseVariables() in the master when done.
  • Detaching the package removes handles and clears shared variables unless another R process still holds a view.
  • Keep write access simple (read-mostly is safest). If multiple workers write to the same region, coordinate externally.

7 Troubleshooting

  • “Unknown input format for X/VARS”: ensure X is a numeric matrix (double) or a character name of a registered object; VARS is either a named list (to register) or character vector of existing names.
  • Memory not freed: check viewList() in workers; any remaining views prevent releaseVariables() from reclaiming memory.
  • Anonymous functions and namespaces: if NAMESPACE is missing and FUN is an inline lambda, the default namespace is "unnamed". Prefer explicit NAMESPACE in production.

8 Essentials

  • registerVariables(namespace, variableList) — put objects into shared memory.
  • retrieveViews(namespace, variableNames) — get ALTREP views (workers).
  • releaseViews(namespace, variableNames) — release worker views.
  • releaseVariables(namespace, variableNames) — free objects (master).
  • memApply(X, MARGIN, FUN, NAMESPACE = NULL, VARS = NULL, MAX.CORES = NULL) — matrix apply with shared memory.
  • memLapply(X, FUN, NAMESPACE = NULL, VARS = NULL, MAX.CORES = NULL) — list apply with shared memory.

9 References

[Thrun and Märte, 2025] Thrun, M.C., Märte, J.: Memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE, The R Journal, in revision, 2025.