Utilities API¶

Memory Estimation¶

Heuristic RAM estimator for parallel CASA imaging workers.

CASA’s C++ imaging engine (casatools) allocates multiple image-sized buffers during gridding that Python and Dask cannot track or free. This module provides a rough estimate of peak RAM usage so that users can choose an appropriate nworkers for their system.

Memory model¶

During active imaging of a single sub-cube, CASA keeps approximately the following buffers resident (per channel):

Buffer	Dtype	Bytes/pixel
Complex visibility grid	complex64	8
Weight grid	complex64	8
FFT workspace (in + out)	complex64	16
Residual image	float32	4
Model image	float32	4
PSF image	float32	4
Weight image (sumwt)	float32	4
Primary beam (PB)	float32	4
Mask	float32	4
Temporary / bookkeeping	mixed	~20

This sums to roughly 76 bytes per pixel per channel for a standard gridder with deconvolver='hogbom' and Stokes I.

Scaling factors (multiplicative):

Mosaic gridder — each pointing requires a convolution function (CF) table; memory scales with the number of fields and CF support size. A 1.5x–3x multiplier over standard is typical.
MTMFS deconvolver — internal Hessian products scale as nterms squared.
Multi-channel sub-cubes — linear in nchan_per_task.

Calibration¶

The 76 B/pix/chan constant was calibrated against an ALMA Band 6 cube-imaging run (IRC+10216, 8000 x 8000, 40 antennas, 449 280 rows, gridder='standard', deconvolver='hogbom'), where each worker consumed ~4.9 GiB of C++ memory with 1 channel per task.

4.9 GiB / (8000 * 8000 * 1 chan) ≈ 76 B/pix/chan

The MS row count (nrows) contributes negligibly — visibilities are processed in row chunks that occupy a few MB, dwarfed by the multi-GiB image grids. It is included only as a minor additive term.

pclean.utils.memory_estimate.BYTES_PER_PIXEL_STANDARD: float = 76.0¶: Bytes per pixel per channel for the standard gridder (Stokes I, hogbom).

pclean.utils.memory_estimate.WORKER_BASE_OVERHEAD_GIB: float = 0.7¶: Python + Dask worker process baseline overhead (GiB).

pclean.utils.memory_estimate.estimate_worker_memory_gib(imsize, nchan_per_task=1, gridder='standard', deconvolver='hogbom', nterms=1, nfields=1)[source]¶

Estimate peak RAM (GiB) consumed by a single worker.

Parameters:

imsize (Sequence[int] | int) – Image dimensions in pixels. A scalar is treated as a square.
nchan_per_task (int) – Number of channels each worker images (cube_chunksize).
gridder (str) – Gridder name (standard, mosaic, wproject, etc.).
deconvolver (str) – Deconvolver name. mtmfs triggers the nterms multiplier.
nterms (int) – Number of Taylor terms (only relevant for mtmfs).
nfields (int) – Number of mosaic pointings (used to scale mosaic overhead).

Returns:

Estimated peak memory in GiB.

Return type:

float

Examples:

>>> estimate_worker_memory_gib(imsize=8000, nchan_per_task=1)
5.22...
>>> estimate_worker_memory_gib(imsize=[1280, 1024], gridder='mosaic',
...                            deconvolver='mtmfs', nterms=2)
5.08...

pclean.utils.memory_estimate.estimate_peak_ram_gib(nworkers, imsize, nchan_per_task=1, gridder='standard', deconvolver='hogbom', nterms=1, nfields=1)[source]¶

Estimate peak system RAM (GiB) for nworkers concurrent tasks.

Parameters:

nworkers (int) – Number of concurrent Dask workers.
imsize (Sequence[int] | int) – Forwarded to estimate_worker_memory_gib().
nchan_per_task (int) – Forwarded to estimate_worker_memory_gib().
gridder (str) – Forwarded to estimate_worker_memory_gib().
deconvolver (str) – Forwarded to estimate_worker_memory_gib().
nterms (int) – Forwarded to estimate_worker_memory_gib().
nfields (int) – Forwarded to estimate_worker_memory_gib().

Returns:

Estimated total peak RAM in GiB.

Return type:

float

pclean.utils.memory_estimate.recommend_nworkers(available_ram_gib=None, imsize=4096, nchan_per_task=1, gridder='standard', deconvolver='hogbom', nterms=1, nfields=1, ram_safety_factor=0.85)[source]¶

Suggest the maximum number of workers that fit in available RAM.

Parameters:

available_ram_gib (float | None) – Total system RAM in GiB. None reads from the OS.
imsize (Sequence[int] | int) – Forwarded to estimate_worker_memory_gib().
nchan_per_task (int) – Forwarded to estimate_worker_memory_gib().
gridder (str) – Forwarded to estimate_worker_memory_gib().
deconvolver (str) – Forwarded to estimate_worker_memory_gib().
nterms (int) – Forwarded to estimate_worker_memory_gib().
nfields (int) – Forwarded to estimate_worker_memory_gib().
ram_safety_factor (float) – Fraction of available RAM to target (default 0.85 = 85%).

Returns:

Recommended number of workers (at least 1).

Return type:

int

Partitioning¶

Data and image partitioning utilities.

Uses casatools.synthesisutils to divide data for continuum (row-based) and cube (frequency-based) parallelism, and also provides pure-Python fallback partitioners.

pclean.utils.partition.partition_continuum(config, nparts)[source]¶

Partition data by visibility rows for parallel continuum imaging.

Uses synthesisutils.contdatapartition() to split each MS across nparts workers. Each returned dict is a CASA-native parameter bundle with selection narrowed to its row chunk and a unique partial image name.

Parameters:

config (PcleanConfig) – Full imaging configuration.
nparts (int) – Number of partitions.

Returns:

One CASA-native bundle (dict) per worker.

Return type:

list[dict]

pclean.utils.partition.partition_cube(config, nparts)[source]¶

Partition the output cube by frequency channels for parallel cube imaging.

Uses synthesisutils.cubedataimagepartition() when possible, falling back to an even-split heuristic.

Parameters:

config (PcleanConfig) – Full imaging configuration.
nparts (int) – Number of partitions.

Returns:

One PcleanConfig per worker, covering a non-overlapping range of output channels.

Return type:

list[PcleanConfig]

pclean.utils.partition.partial_image_name(base, part_index)[source]¶

Return the partial-image path for a given partition index.

Parameters:

base (str)
part_index (int)

Return type:

str

Image Concatenation¶

Image concatenation utilities.

After parallel cube imaging each worker produces a sub-cube. This module concatenates them into the final output cube, mirroring the ia.imageconcat() call used in CASA’s parallel cube imager.

Three concatenation modes are supported (via concat_mode in ClusterConfig):

paged (default): Pixel data are physically copied into a new self-contained CASA image. Slower but fully independent of the subcubes after completion.
virtual (mode='nomovevirtual'): The output image is a lightweight reference catalog that points at the original subcube files. Near-instant but requires the subcubes to stay on disk (keep_subcubes=True).
movevirtual (mode='movevirtual'): The subcube directories are renamed (moved) into the output image. Near-instant on the same filesystem; the subcubes are consumed in the process.

When concat_mode='auto' (the default), the mode is derived from keep_subcubes: True → virtual, False → paged.

When multiple extensions need concatenating (e.g. .image, .residual, .psf, …), a ProcessPoolExecutor (spawn start method) is used for paged mode so that each subprocess gets its own casacore TableCache and there is no shared C++ state between workers. Virtual modes are run sequentially because they are near-instant and write shared catalog metadata.

pclean.utils.image_concat.concat_images(outimage, inimages, axis=-1, relax=True, overwrite=True, mode='paged')[source]¶

Concatenate a list of CASA images along axis.

Parameters:

outimage (str) – Path for the output concatenated image.
inimages (list[str]) – Ordered list of input sub-images.
axis (int) – Axis to concatenate along (default -1 -> spectral).
relax (bool) – Relax axis checks.
overwrite (bool) – Overwrite outimage if it exists.
mode (str) – CASA imageconcat mode. 'paged' (default) physically copies data. 'nomovevirtual' creates a reference catalog (near-instant, but requires input images to stay on disk). 'movevirtual' creates a virtual concatenation by moving subcube directories into the output image.

Return type:

None

pclean.utils.image_concat.concat_subcubes(base_imagename, nparts, extensions=None, mode='paged', max_workers=4, virtual=None, _pool_cls=None)[source]¶

Concatenate all standard image products from numbered sub-cubes.

Products include .image, .residual, .psf, etc.

The mode parameter is forwarded directly to ia.imageconcat():

'paged' — pixel data are physically copied (default, always safe). Extensions are concatenated in parallel via ProcessPoolExecutor (spawn context) so each subprocess owns an independent casacore TableCache — true I/O parallelism, no shared C++ state.
'nomovevirtual' — lightweight reference catalog, near-instant but subcube files must remain on disk. Run sequentially because virtual-catalog metadata is shared across calls.
'movevirtual' — renames subcubes into the output directory (near-instant, subcubes are consumed). Also run sequentially.

Deprecated since version The: virtual parameter is deprecated. Pass mode explicitly.

Parameters:

base_imagename (str) – The original imagename (without .subcube.N).
nparts (int) – Number of sub-cubes.
extensions (list[str] | None) – Image extensions to concatenate. Defaults to a standard set.
mode (str) – CASA imageconcat mode string.
max_workers (int) – Maximum parallel concatenation workers (paged mode only).
virtual (bool | None) – Deprecated. True maps to mode='nomovevirtual'.

Return type:

None

ADIOS2 Checks¶

Quick diagnostic to verify Adios2StMan availability in the current casatools build.

class pclean.utils.check_adios2.CasatoolsInfo(version='unknown', origin='unknown', conda_build_string='', adios2_supported=False, details=<factory>)[source]¶

Bases: object

Summary of the casatools installation.

Parameters:

version (str)
origin (str)
conda_build_string (str)
adios2_supported (bool)
details (dict[str, str])

version: str = 'unknown'¶

origin: str = 'unknown'¶

conda_build_string: str = ''¶

adios2_supported: bool = False¶

details: dict[str, str]¶

pclean.utils.check_adios2.get_casatools_info()[source]¶

Detect casatools version and whether it was installed via conda or pip.

Inspects conda-meta records first (definitive for conda installs), then falls back to importlib.metadata / pip provenance checks.

Returns:: A populated CasatoolsInfo dataclass.
Return type:: CasatoolsInfo

pclean.utils.check_adios2.check_adios2_support(*, cleanup=True)[source]¶

Create a throwaway CASA table with Adios2StMan and report whether it succeeds.

This attempts to bind a single float column to the Adios2StMan storage manager. If the underlying casacore was not compiled with ADIOS2 support (i.e. the nompi variant), a RuntimeError about an unknown storage manager is raised.

Parameters:: cleanup (bool) – Remove the temporary table directory after the check.
Returns:: True if Adios2StMan is available, False otherwise.
Return type:: bool

pclean.utils.check_adios2.ms_uses_adios2(ms_path)[source]¶

Check whether any column in the given MS is managed by Adios2StMan.

Opens the table read-only, inspects getdminfo(), and returns True if at least one data-manager entry has TYPE == 'Adios2StMan'.

Parameters:: ms_path (str) – Path to a MeasurementSet directory.
Returns:: True if the MS contains ADIOS2-managed columns.
Return type:: bool

pclean.utils.check_adios2.force_omp_single_thread()[source]¶

Force the OpenMP runtime to use exactly 1 thread.

General thread-safety precaution for ADIOS2-backed storage managers. CASA gridding internals can launch OpenMP tasks that concurrently access the MS; limiting to a single thread avoids potential data races in the ADIOS2 engine.

os.environ['OMP_NUM_THREADS'] alone is insufficient because libgomp reads the variable only once (at the first OpenMP call, typically during import casatools). This helper therefore also calls omp_set_num_threads(1) via ctypes to override the cached value immediately.

Return type:: None

ADIOS2 Conversion¶

Convert a MeasurementSet to use the Adios2StMan storage manager.

pclean.utils.convert_adios2.convert_ms_to_adios2(input_ms, output_ms, *, target_columns=('DATA', 'CORRECTED_DATA', 'MODEL_DATA', 'FLAG', 'WEIGHT', 'SIGMA'), overwrite=False, engine_type='BP4', engine_params=None, adios2_xml=None, taql=None)[source]¶

Copy a MeasurementSet, rebinding heavy columns to Adios2StMan.

The function reads the existing dminfo from input_ms, replaces the storage-manager type for every manager that handles one of the target_columns, and performs a deep valuecopy so that the bulk data is physically rewritten through the ADIOS2 C++ backend.

Sub-tables (ANTENNA, FIELD, SPECTRAL_WINDOW, etc.) are left on their default storage managers because their I/O footprint is negligible.

Note

Adios2StMan requires the copy to happen in a single Table::deepCopy pass. Manual row-level approaches (addrows + putcol, or copyrows) are not supported because the ADIOS2 engine needs cell shapes established through casacore’s internal copy path and does not allow reopening a table for append.

Casacore’s C++ deepCopy streams data row-by-row, but the ADIOS2 BP engine accumulates all Put() data within a single step — EndStep() / Close() run only in the Adios2StMan destructor. Use engine_params to control buffer sizing.

The default ADIOS2 engine (usually BP5) ignores MaxBufferSize; it only honours BufferChunkSize. This function defaults to BP4 and passes the engine type via the ENGINETYPE dminfo SPEC field so casacore’s Adios2StMan::makeObject sets the correct engine before opening.

Parameters:

input_ms (str) – Path to the source MeasurementSet.
output_ms (str) – Destination path for the ADIOS2-backed copy.
target_columns (tuple[str, ...] | list[str]) – Column names to rebind to Adios2StMan.
overwrite (bool) – If True, remove output_ms if it already exists.
engine_type (str) – ADIOS2 engine type. 'BP4' is recommended because BP4 respects MaxBufferSize and flushes to disk when the buffer exceeds that cap. BP5 uses a different allocation model (see BufferChunkSize).
engine_params (dict[str, str] | None) –
ADIOS2 engine parameters. Useful keys:
- MaxBufferSize — triggers flush when exceeded (BP4 only, e.g. '2Gb').
- InitialBufferSize — starting allocation (BP4).
- BufferGrowthFactor — growth multiplier (BP4).
- BufferChunkSize — per-chunk size (BP5).
adios2_xml (str | None) – Path to a user-supplied ADIOS2 XML config file. If provided, engine_type and engine_params are ignored.
taql (str | None) – Optional TaQL WHERE clause to select a subset of rows before copying (e.g. 'DATA_DESC_ID IN [0]'). When set, tb.query(taql) is used as the copy source, so only matching rows are written. Sub-tables are copied as-is.

Returns:

The output_ms path on success.

Raises:

FileNotFoundError – If input_ms does not exist.
FileExistsError – If output_ms exists and overwrite is False.
RuntimeError – If Adios2StMan is not available in the current build.

Return type:

str

pclean.utils.convert_adios2.split_and_convert_ms_to_adios2(input_ms, output_dir, *, target_columns=('DATA', 'CORRECTED_DATA', 'MODEL_DATA', 'FLAG', 'WEIGHT', 'SIGMA'), overwrite=False, engine_type='BP4', engine_params=None, adios2_xml=None)[source]¶

Select rows by SPW and convert each subset to Adios2StMan in one pass.

This implements Workaround 3 from the Adios2StMan debug notes. For each SPW the function builds a TaQL DATA_DESC_ID IN [...] clause and passes it to convert_ms_to_adios2 via the taql parameter. The row selection and ADIOS2 rebinding happen in a single deepCopy — no intermediate MS is written.

Sub-tables (SPECTRAL_WINDOW, DATA_DESCRIPTION, etc.) are copied as-is and therefore still contain entries for all SPWs. This is cosmetic; the imager only accesses rows present in the main table.

Parameters:

input_ms (str) – Path to the source MeasurementSet.
output_dir (str) – Directory under which per-SPW ADIOS2 datasets are written (<output_dir>/<basename>_spw<N>.ms).
target_columns (tuple[str, ...] | list[str]) – Columns to rebind to Adios2StMan.
overwrite (bool) – If True, remove existing outputs.
engine_type (str) – ADIOS2 engine type (forwarded to convert_ms_to_adios2).
engine_params (dict[str, str] | None) – ADIOS2 engine parameters.
adios2_xml (str | None) – Path to a user-supplied ADIOS2 XML config.

Returns:

List of output ADIOS2-backed MS paths.

Return type:

list[str]