# Parallelization Modes ## Cube Mode (`specmode='cube'`) ```mermaid flowchart LR subgraph Coord["Coordinator"] direction TB A[pclean] --> B[Partition
channels] B --> C[Submit] G[Gather] --> H[Concat
subcubes] H --> I[Final cube] end subgraph Workers["Dask Workers (embarrassingly parallel)"] direction TB subgraph W0["Worker 0 · ch 0-23"] direction LR W0a[setup] --> W0b[PSF] --> W0c[PB] --> W0d[Major 1] --> W0e[Converge?] --> W0f[Mask] --> W0g[Minor] --> W0h[Major 2] --> W0i[Done] end subgraph W1["Worker 1 · ch 24-47"] direction LR W1a[setup] --> W1b[PSF → PB → Major → Minor → Done] end subgraph W2["Worker 2 · ch 48-70"] direction LR W2a[setup] --> W2b[PSF → PB → Major → Minor → Done] end subgraph W3["Worker 3 · ch 71-93"] direction LR W3a[setup] --> W3b[PSF → PB → Major → Minor → Done] end subgraph W4["Worker 4 · ch 94-116"] direction LR W4a[setup] --> W4b[PSF → PB → Major → Minor → Done] end end C --> W0a C --> W1a C --> W2a C --> W3a C --> W4a W0i --> G W1b --> G W2b --> G W3b --> G W4b --> G style Coord fill:#e1f5fe style Workers fill:#c8e6c9 style W0 fill:#a5d6a7 style W1 fill:#a5d6a7 style W2 fill:#a5d6a7 style W3 fill:#a5d6a7 style W4 fill:#a5d6a7 ``` ## Continuum Mode (`specmode='mfs'`) ```mermaid flowchart LR subgraph Init["Setup"] direction TB A[pclean] --> B[Partition
rows] --> C[Create
actors] end subgraph PSF["PSF (parallel)"] direction TB P0[Worker 0] ~~~ P1[Worker 1] ~~~ P2[Worker N] end subgraph PB["PB (parallel)"] direction TB PB0[Worker 0] ~~~ PB1[Worker 1] ~~~ PB2[Worker N] end subgraph Maj1["Major Cycle (parallel)"] direction TB M0[Worker 0
grid] ~~~ M1[Worker 1
grid] ~~~ M2[Worker N
grid] end subgraph Loop["Iteration Loop (coordinator)"] direction TB POST[Gather +
Normalize] --> MASK[setupMask] MASK --> CONV{Converged?} CONV -->|No| MINOR[Minor cycle
serial deconv] MINOR --> PRE[Scatter
model] PRE --> MAJ2 CONV -->|Yes| RESTORE[Restore +
PBcor] end subgraph MAJ2["Next Major (parallel)"] direction TB M20[Worker 0] ~~~ M21[Worker 1] ~~~ M22[Worker N] end C --> PSF PSF --> NORM1[Normalize
PSF] NORM1 --> PB PB --> NORM2[Normalize
PB] NORM2 --> Maj1 Maj1 --> POST MAJ2 --> POST style Init fill:#e1f5fe style PSF fill:#c8e6c9 style PB fill:#c8e6c9 style Maj1 fill:#c8e6c9 style MAJ2 fill:#c8e6c9 style Loop fill:#fff9c4 style MINOR fill:#ffecb3 ``` ## Key Differences | Aspect | Cube | Continuum (MFS) | |--------|------|-----------------| | **What's parallel** | Entire pipeline per subcube | Only gridding/degridding (major cycle) | | **Minor cycle** | Parallel (per subcube) | Serial on coordinator | | **Communication** | None (embarrassingly parallel) | Gather/scatter each major cycle | | **Partition axis** | Frequency channels | Visibility rows | | **Final assembly** | `imageconcat` of subcubes | Normalizer gathers partial images | ## Known Limitations ### `weighting='briggsbwtaper'` in Parallel Cube Mode The `briggsbwtaper` weighting scheme (CAS-13021) requires the **fractional bandwidth** of the full cube: ``` fracBW = 2 * (maxFreq - minFreq) / (maxFreq + minFreq) ``` In parallel cube mode each Dask worker images an independent sub-cube (often a single channel), so the C++ auto-computation of `fracBW` from the sub-cube's spectral axis would produce `0.0` and fail. #### Fix The `fracBW` parameter needs to be exposed through the casatools Python binding (`synthesisimager.setweighting(fracbw=...)`), then pclean can pre-computes it from the full cube `start`/`width`/`nchan` before dispatching to workers. Each worker receives the correct full-bandwidth `fracBW` scalar alongside its independent per-channel Briggs density grid. **Requirements:** - casatools must be rebuilt from the patched XML and C++ sources - `start` and `width` must be specified as frequency quantities (e.g. `"100GHz"`) so the pre-computation can resolve them. If they are not parseable, `fracBW` falls back to `0.0` (auto-compute), which will still fail for single-channel sub-cubes. #### Fallback workaround Use `weighting='briggs'` (with `perchanweightdensity=True`, the default), which computes per-channel Briggs weights independently — this is compatible with per-channel parallelization but does not offer the improved imaging fidelity of off-axis sources for wide-bandwidth cubes. ```python pclean( ... weighting="briggs", # not "briggsbwtaper" robust=0.5, perchanweightdensity=True, parallel=True, cube_chunksize=1, ) ``` #### CASA `tclean` reference `tclean` itself also restricts `briggsbwtaper`: - Requires `perchanweightdensity=True` - Requires `specmode='cube'` (not `'mfs'` or `'cont'`) - Requires `npixels=0` See `task_tclean.py` lines 218–236 in casa6. ### Cube Gridding Must Stay Enabled for nchan=1 Subcubes In upstream CASA the C++ default is `doingCubeGridding_p = True`. For `specmode='cube'` the C++ guard condition is: ```cpp if ((itsMaxShape[3] > 1 || mode.contains("cube")) && doingCubeGridding_p) ``` This means cube-mode images **always** take the `CubeMajorCycleAlgorithm` path — even with `nchan=1` — because `mode.contains("cube")` is true. The `CubeMajorCycleAlgorithm` runs a different gridding code path than the non-cube `runMajorCycle`, and also handles all PSF/residual normalization internally (gatherpsfweight, dividepsfbyweight, etc.). **pclean must NOT call `setcubegridding(False)` for single-channel subcubes.** Disabling cube gridding switches to the non-cube `runMajorCycle` path, producing fundamentally different gridded visibilities and causing residual flux errors of 3×–21× compared to the result from the same data imaged with cube gridding enabled. This was verified empirically (2026-03-11) by comparing serial pclean (nchan=1, cube gridding disabled) against tclean (nchan=1, default cube gridding enabled). This applies regardless of whether the MS is backed by standard CTDS or ADIOS2 storage managers. Cube gridding is always left at the C++ default (enabled). The only place pclean calls `setcubegridding(False)` is in `partition._resolve_frequency_grid()`, which creates a throwaway tiny-image synthesisimager purely to resolve the spectral coordinate grid — not for science imaging.