Memory Management in Parallel Workers¶
Problem with memory_limit='auto'¶
With Dask’s default memory_limit='auto', total system RAM is divided equally
among workers (e.g. 83.79 GiB / 12 workers = 6.98 GiB each).
CASA’s C++ imaging engine (casatools) allocates roughly 5 GiB per worker during gridding in our initial test case (see the Reference Test Case below). Dask classifies these allocations as “unmanaged memory” because they happen inside native code, outside Python’s heap.
When a worker’s process memory reaches 80% of its limit, Dask pauses the worker – it stops accepting new tasks. Because the C++ memory cannot be freed by Dask (only completing the CASA task releases it), the pause/resume cycle causes most workers to stall, leaving only a handful active at any given time.
Under sustained memory pressure, the Dask TCP stream between workers and the scheduler can also become corrupted, producing bogus frame-length headers:
Unable to allocate 7.27 EiB for an array with shape (8387229930220700999,)
This is not a real allocation request – it is a deserialization of garbage bytes from a corrupted TCP frame.
Solution¶
Dask’s per-worker memory management is disabled by setting memory_limit='0'
in DaskClusterManager. This is appropriate for CASA workloads because:
All heavy memory comes from C++ casatools – Dask cannot free it.
Pausing workers does not reduce memory – the C++ allocation persists until the task finishes.
Concurrency is already bounded by the
as_completedpattern inParallelCubeImager, which keeps at mostnworkerstasks in-flight.
Relevant Code¶
File |
Role |
|---|---|
|
|
|
|
|
Post-task |
When to Re-enable Memory Limits¶
On a shared cluster (e.g. via dask-jobqueue) where other jobs compete for
RAM, a non-zero memory_limit can prevent worker processes from being
OOM-killed by the OS. Pass it explicitly:
cluster = DaskClusterManager(nworkers=8, memory_limit='16GiB')
For dedicated machines, memory_limit='0' (the default) gives the best
throughput.
Reference Test Case¶
The memory behaviour described above was observed with the following ALMA
Band 6 cube-imaging job (scripts/test1.py):
Parameter |
Value |
|---|---|
Target |
IRC+10216 |
Measurement Set |
|
Spectral window |
25 |
Antennas |
40 |
Image size |
8000 x 8000 pixels |
Cell size |
0.0046 arcsec |
Spectral mode |
cube |
Channels |
7677 |
Start frequency |
267.5866 GHz |
Channel width |
0.2441 MHz |
Deconvolver |
Hogbom |
Weighting |
Robust 0.5 |
Auto-masking |
auto-multithresh |
niter |
50 000 |
Threshold |
2.0 mJy |
Parallel workers |
12 |
|
1 (one channel per task) |
On a machine with 83.79 GiB total RAM and 12 workers, each worker consumed
approximately 5 GiB of unmanaged (C++) memory during gridding, exceeding
Dask’s default 80% pause threshold of 5.6 GiB per worker (6.98 GiB limit).
Setting memory_limit='0' eliminated worker pausing and the associated TCP
corruption errors.