Configuration Architecture

This document describes the current configuration system in pclean.

Overview

pclean uses a single-source-of-truth configuration built on pydantic v2 BaseModel classes. The top-level PcleanConfig groups all parameters into nine logical sub-models and provides:

  • Built-in defaults matching CASA tclean behaviour.

  • YAML file I/O for reproducible imaging runs.

  • Layered composition (base + overlay merging).

  • Backward-compatible flat-kwargs interface for the pclean() function.

  • to_casa_*() bridge methods that translate user-facing parameters into the CASA-native dicts consumed by the C++ synthesis tools.

Sub-model Hierarchy

PcleanConfig
├── selection:      SelectionConfig      # vis, field, spw, timerange, ...
├── image:          ImageConfig          # imagename, imsize, cell, specmode, nchan, ...
├── grid:           GridConfig           # gridder, facets, wprojplanes, pblimit, ...
├── weight:         WeightConfig         # weighting, robust, noise, uvtaper, ...
├── deconvolution:  DeconvolutionConfig  # deconvolver, scales, masking params, ...
├── iteration:      IterationConfig      # niter, gain, threshold, nmajor, ...
├── normalization:  NormConfig           # pblimit, normtype, psfcutoff
├── misc:           MiscConfig           # restart, savemodel, calcres, calcpsf
└── cluster:        ClusterConfig        # parallel, nworkers, type, ...
                    └── slurm: SlurmConfig   # queue, account, walltime, ...

All fields carry typed defaults. Constructing PcleanConfig() with no arguments produces a valid configuration equivalent to tclean defaults.

Four Ways to Build a Config

1. Direct Python Construction

from pclean.config import PcleanConfig, ImageConfig, IterationConfig

config = PcleanConfig(
    image=ImageConfig(imagename='test', imsize=[512, 512], cell='0.5arcsec'),
    iteration=IterationConfig(niter=500, threshold='1mJy'),
)

2. YAML File

config = PcleanConfig.from_yaml('my_config.yaml')

A YAML file mirrors the sub-model hierarchy:

selection:
  vis: my_data.ms
  field: '0'
image:
  imagename: output
  imsize: [1024, 1024]
  cell: 0.5arcsec
  specmode: cube
  nchan: 128
iteration:
  niter: 1000
cluster:
  parallel: true
  nworkers: 8

Only fields that differ from the defaults need to be specified.

3. Layered Composition (Merge)

base = PcleanConfig.from_yaml('defaults.yaml')
overlay = PcleanConfig.from_yaml('my_overrides.yaml')
config = PcleanConfig.merge(base, overlay)

Later configs win. Merging is deep — nested sub-model fields are merged recursively rather than replaced wholesale.

4. Flat Keyword Arguments (Backward Compat)

The pclean() function still accepts the 80+ flat keywords familiar from CASA tclean. Internally it calls:

config = PcleanConfig.from_flat_kwargs(vis='my.ms', imagename='test', ...)

which routes each keyword into the correct sub-model. When a --pconfig YAML file is also provided, the flat kwargs override the file values.

Merge Order

When multiple sources are provided the merge priority is (highest wins):

  1. Explicit keyword arguments / CLI flags

  2. --pconfig YAML file

  3. --preset (later --preset flags override earlier ones)

  4. Built-in pydantic defaults

Presets

Named presets live under src/pclean/configs/presets/ as YAML files (e.g. vlass.yaml) and are bundled inside the wheel via tool.setuptools.package-data. They can be loaded via:

from pclean.config import load_preset
config = load_preset('vlass')

or from the CLI:

python -m pclean --preset vlass --selection.vis my.ms --image.imagename out

Presets set only the fields relevant to the observing programme; everything else falls through to the built-in defaults.

CLI Integration

The python -m pclean CLI supports three config-related flags:

Flag

Purpose

--pconfig FILE

Load a YAML config as the base

--preset NAME

Load a named preset as the base

--dump-config FILE

Write the resolved config to YAML and exit

Dot-notation overrides on the command line (e.g. --cluster.nworkers 16) are merged on top.

CASA Bridge Methods

CASA’s C++ synthesis tools (synthesisimager, synthesisdeconvolver, synthesisnormalizer, iterbotsink) expect parameter dicts with CASA-internal field names and conventions that differ from the user-facing API. PcleanConfig provides bridge methods that perform these translations:

Method

Produces

Notable Translations

to_casa_selpars()

{'ms0': {...}, 'ms1': {...}}

timerangetimestr, uvrangeuvdist, observationobs, intentstate

to_casa_impars()

{'0': {...}}

Ensures imsize/cell are length-2 lists; injects deconvolver and restart cross-refs

to_casa_gridpars()

{'0': {...}}

Injects imagename, deconvolver, interpolation cross-refs

to_casa_weightpars()

{...}

weightingtype + rmode (briggs→norm, briggsabs→abs, briggsbwtaper→bwtaper+fracbw); mosweightmultifield; perchanweightdensityusecubebriggs

to_casa_decpars()

{'0': {...}}

Injects fullsummary from iteration config

to_casa_normpars()

{'0': {...}}

nterms = dec.nterms if mtmfs else 1; injects imagename, specmode

to_casa_iterpars()

{...}

gainloopgain; builds allimages sub-record with multiterm flag

to_casa_miscpars()

{...}

Passes restart, calcres, calcpsf

to_casa_bundle()

Full dict of all above

Serializable payload for continuum-parallel workers

Bridge methods live on PcleanConfig (not on sub-models) because many CASA translations cross sub-model boundaries — for example, imagename appears in impars, gridpars, normpars, and iterpars.

Convenience Properties

PcleanConfig exposes frequently accessed values as properties to avoid repetitive sub-model traversal:

Property

Returns

specmode

image.specmode

imagename

image.imagename

parallel

cluster.parallel

niter

iteration.niter

is_cube

True if specmode in ('cube', 'cubedata', 'cubesource')

is_mfs

True if specmode is 'mfs'

nfields

Always 1 (single-field imaging)

nms

Number of measurement sets in selection.vis

Data Flow Through Engines

pclean(vis=..., **kw)          # user entry point
  │
  ▼
PcleanConfig.from_flat_kwargs()  # build config
  │
  ├─ parallel=False ──► SerialImager(config)
  │                       └── calls to_casa_*() once in __init__
  │                       └── passes dicts to C++ tools
  │
  ├─ parallel=True, cube ──► ParallelCubeImager(config)
  │    │                       └── partition_cube(config) → list[PcleanConfig]
  │    │                       └── submit config.model_dump() to Dask workers
  │    ▼
  │    Workers: PcleanConfig.model_validate(dict) → SerialImager(config)
  │
  └─ parallel=True, mfs ──► ParallelContinuumImager(config)
       │                       └── partition_continuum(config) → list[dict]  (CASA bundles)
       │                       └── submit bundles to Dask actors
       ▼
       Workers: receive CASA bundle dicts, use directly with synthesisimager
       Coordinator: uses config.to_casa_normpars/decpars/iterpars for normalizer/deconvolver/iterbot

Why Two Serialization Strategies?

  • Cube workers run a full SerialImager (imaging + deconvolution), so they need the complete hierarchical config to call all to_casa_*() methods. The config is serialized via model_dump() and reconstructed on the worker with model_validate().

  • Continuum workers only run synthesisimager (gridding). They receive pre-translated CASA bundle dicts from PcleanConfig.to_casa_bundle(). This avoids a round-trip through pydantic on the worker side and naturally handles the fact that synthesisutils.contdatapartition() returns CASA-native selpars that do not cleanly map back to user-facing config fields.

Partitioning

Function

Input

Output

Strategy

partition_cube(config, nparts)

PcleanConfig

list[PcleanConfig]

Uses config.make_subcube_config() to create per-subcube configs with adjusted start, nchan, and imagename

partition_continuum(config, nparts)

PcleanConfig

list[dict]

Calls synthesisutils.contdatapartition(), deep-copies config.to_casa_bundle(), overrides selpars and imagename per partition

File Inventory

File

Role

src/pclean/config.py

Sub-model definitions, PcleanConfig, YAML I/O, merge, flat-kwargs bridge, all to_casa_*() methods, make_subcube_config()

src/pclean/pclean.py

pclean() entry point; builds PcleanConfig, dispatches to engines

src/pclean/__main__.py

CLI; --pconfig, --preset, --dump-config

src/pclean/imaging/serial_imager.py

Accepts PcleanConfig; pre-computes CASA dicts in __init__

src/pclean/parallel/worker_tasks.py

Dask worker functions; cube tasks accept config dicts, continuum tasks accept CASA bundles

src/pclean/parallel/cube_parallel.py

Cube engine; accepts PcleanConfig, serializes subcube configs

src/pclean/parallel/continuum_parallel.py

Continuum engine; accepts PcleanConfig, distributes CASA bundles

src/pclean/utils/partition.py

partition_cube() and partition_continuum()

src/pclean/configs/defaults.yaml

Auto-generated reference YAML with all built-in default values

src/pclean/configs/presets/vlass.yaml

VLASS continuum imaging preset

src/pclean/params.py

Deprecated legacy PcleanParams (retained for backward compat)

Defaults Reference

The canonical defaults are the pydantic Field defaults on each sub-model class in config.py. The file src/pclean/configs/defaults.yaml is a machine-generated snapshot and should be regenerated after changing any default:

pixi run -e dev gen-defaults