# ALMA Cube Imaging — pclean vs tclean Comparison (v1) Source reports: `alma_pclean_perf_v1.md` / `alma_tclean_perf_v1.md` > **Caveat**: The pclean run used `niter=50000` (same as tclean) but **no > Hogbom iterations were actually executed** (confirmed: no > `executeminorcycle` calls appear in the log). The most likely cause is > that `auto-multithresh` generated an empty mask on the very first > `has_converged()` check (logged: *"Peak residual within mask : 0"*), > causing `iterbotsink.cleanComplete()` to signal early convergence before > any CLEAN components could be subtracted. The run output is therefore > equivalent to `niter=0` (PSF + residual + restore only), but the root > cause is a convergence-state issue, not a deliberate `niter=0` setting. > tclean was also killed before any minor cycle; both runs produced > PSF-only output. The comparison is valid for the imaging and memory > characteristics but not for deconvolution quality. --- ## 1. Run Outcome | | tclean | pclean | |---|---|---| | Completed | **No** — OOM-killed | **Yes** | | Wall time (before kill / total) | ~10h 24m (killed) | **13h 47m 33s** | | Last phase reached | `initMinorCycle` (automask setup) | cleanup | | Kill signal | SIGKILL (OOM, rank 1) | — | **Takeaway**: tclean ran for 10h 24m and failed before executing a single CLEAN iteration. pclean completed the equivalent output (PSF + residual) in 13h 47m. --- ## 2. Memory | Metric | tclean | pclean | Difference | |---|---|---|---| | Peak RSS | **154.2 GB** (06:59:18) | **58.6 GB** | pclean −62% | | Peak virtual | **171.2 GB** | **81.6 GB** | pclean −52% | | Peak MMap RSS | **146.5 GB** | **52.4 GB** | pclean −64% | | Swap used | 0 MB | ~5 MB | — | | Page cache (start) | 122.1 GB | 122 GB | same | | Page cache (end) | **7.4 GB** (−115 GB) | **15.4 GB** (−107 GB) | tclean exhausts more | **Takeaway**: pclean uses **~62% less peak RSS** (58.6 vs 154.2 GB). Each Dask worker loads one channel at a time (~5–6 GB), while tclean's 11 CASA MPI ranks hold all 1000 planes collectively from the start of `makePSF`. This difference is what separates a successful run from an OOM kill. > **Note**: tclean's peak of 154.2 GB occurred at 06:59:18 — just 9 minutes into > `makePSF` — as all weight grids for 1000 channels were allocated upfront. > The commonly cited value of 107.8 GB was the *post-kill cleanup* sample > (17:20:01), not the run peak. --- ## 3. Phase Timings | Phase | tclean | pclean | |---|---|---| | Startup / setup | ~13 s | ~2 s | | **Total parallel imaging** | **4h 7m 35s** (makePSF only¹) | **10h 25m 17s** (full pipeline²) | | Major cycle 1 | 6h 16m 24s (partial, killed) | N/A (0 iterations executed³) | | Subcube concat | N/A (monolithic) | **3h 20m** (24% of total) | | Cleanup | ~5 min (post-kill) | ~1m 42s | ¹ tclean's 4h 7m 35s is precisely timed from CASA log markers (`INFO makePSF` → `INFO executeMajorCycle`). ² pclean's 10h 25m 17s is the **total Dask imaging wall time** (06:49:56 → 17:15:13) covering `setup`, `make_psf`, `make_pb`, `run_major_cycle` (initial residual), and `restore` — all executed serially within each worker. Per-step breakdown was not logged in the v1 run. Sub-step timing has since been added to `SerialImager.run()` (INFO lines: `make_psf: Xs`, `make_pb: Xs`, etc.) and will be available in future runs. ³ `niter=50000` was passed but `executeminorcycle` never appears in the log. The auto-multithresh mask was empty on the first `has_converged()` call (*"Peak residual within mask : 0"*), which likely caused `cleanComplete()` to set an early-stop state in the `iterbotsink`. After `update_mask()` the mask became non-zero (~70–75 mJy peaks), but no Hogbom iterations were dispatched. **Fixed (post-v1)**: `SerialImager.run()` now calls `update_mask()` before the first `has_converged()` check so that `initminorcycle()` sees a non-empty mask and `cleanComplete()` correctly returns `False`. **Takeaway**: A direct makePSF-vs-makePSF comparison is not possible from the v1 pclean log. tclean grids all 1000 channels at once across 11 MPI ranks; pclean processes one channel per worker serially — the 10h 25m bound includes all per-channel overhead beyond just PSF computation. pclean's 3h 20m concat overhead is a known bottleneck addressed via `concat_mode` (see §6 of `alma_pclean_perf_v1.md`). --- ## 4. CPU | Metric | tclean | pclean | |---|---|---| | Peak CPU | ~1105% | ~1095% | | Imaging phase | ~1000–1105% (11 ranks) | ~1000–1095% (10 workers) | | Concat phase | N/A | ~160% (serial, I/O-bound) | **Takeaway**: Peak CPU is nearly identical — both saturate ~10–11 cores. pclean's concat phase drops to ~160% because `ia.imageconcat()` is single-threaded and disk-throughput limited. --- ## 5. I/O | Metric | tclean | pclean | |---|---|---| | Total reads | **2.33 TB** | **6.71 TB** (×2.9 more) | | Total writes | **8.24 TB** | **9.83 TB** | | Final output size | **1.48 TB** (partial, killed) | **1.42 TB** (complete, 7 ext.) | | Write amplification | **~5.6×** | **~6.9×** | **Takeaway**: pclean reads nearly 3× more data than tclean. The extra ~4.4 TB of reads comes from the concat phase: each extension requires reading all 1000 subcube inputs (~2.3 TB total) to write the merged output. tclean avoids this because it writes a single monolithic image directly, but pays for it in memory. pclean's higher write amplification (~6.9× vs ~5.6×) reflects the same physical-copy concat: subcube pixels are read and rewritten into the merged image. The `virtual`/`movevirtual` concat modes eliminate this cost. --- ## 6. Parallelism Model | | tclean | pclean | |---|---|---| | Framework | CASA MPI (`parallel=True`) | Dask `LocalCluster` | | Worker count | 11 CASA ranks | 10 Dask workers | | Total NProc (psrecord) | 25 (includes MPI infrastructure) | 13 (main + workers + scheduler) | | Memory model | All channels live across all ranks | One channel per worker at a time | | Concat | None (monolithic output) | Serial extension loop (addressed) | --- ## 7. Summary | | tclean | pclean | |---|---|---| | **Completed** | **No** | **Yes** | | **Peak RSS** | **154.2 GB** | **58.6 GB** | | **makePSF speed** | **Faster** (4h 7m, confirmed) | Unknown (10h 25m is full imaging pipeline) | | **Total wall time** | N/A (killed at 10h 24m) | 13h 47m (complete) | | **OOM risk** | Fatal at 128 GB RAM | None | | **Scalability** | Limited by $O(\text{nchan})$ memory | $O(1)$ memory per worker | pclean trades PSF speed for memory safety. For a 1000-channel, 8000×8000 cube on a 128 GB node, tclean is simply not viable regardless of tuning. pclean's concat overhead (the remaining gap) is addressed by `concat_mode`. --- ```{toctree} :caption: Detailed Reports :maxdepth: 1 alma_pclean_perf_v1 alma_tclean_perf_v1 ```