# pclean SLURM Submit Guide ## Overview `pclean submit` is the recommended way to run parallel cube imaging on HPC clusters. It uses a **two-layer SLURM job stack**: one lightweight **coordinator** job that hosts the Dask scheduler, and N **worker** jobs spawned automatically via `dask-jobqueue.SLURMCluster`. ``` ┌───────────────────────────────────────────────┐ │ Coordinator (SLURM job) │ │ • Hosts Dask scheduler │ │ • Partitions channels into sub-cubes │ │ • Dispatches tasks to workers │ │ • Concatenates sub-cube results │ │ │ │ sbatch → dask-jobqueue spawns: │ │ ┌────────────┐ ┌────────────┐ ┌───────────┐ │ │ │ Worker 0 │ │ Worker 1 │ │ Worker N │ │ │ │ (SLURM job)│ │ (SLURM job)│ │(SLURM job)│ │ │ │ sub-cube 0 │ │ sub-cube 1 │ │sub-cube N │ │ │ └────────────┘ └────────────┘ └───────────┘ │ └───────────────────────────────────────────────┘ ``` Each worker images one sub-cube independently (embarrassingly parallel, no inter-worker communication). The coordinator partitions the frequency axis, submits Dask futures, waits for completion, and concatenates the sub-cube images into the final cube. --- ## Quick Start ### 1. Write a YAML config A minimal SLURM-enabled config needs `cluster.type: slurm` and the `slurm:` / `submit:` sections. See [test_alma_pclean_v4.yaml](../scripts/test_alma_pclean_v4.yaml) for a complete example. ### 2. Submit ```bash pclean submit scripts/test_alma_pclean_v4.yaml ``` Output: ``` 2026-03-12 05:24:08 INFO pclean.config Loading config from scripts/test_alma_pclean_v4.yaml 2026-03-12 05:24:08 INFO pclean.parallel.submit Wrote sbatch script to .../submit.sh 2026-03-12 05:24:08 INFO pclean.parallel.submit Submitted coordinator job 1631 Submitted coordinator job: 1631 ``` ### 3. Monitor ```bash squeue --me ``` ``` JOBID PARTITION USER STATE TIME NAME 1631 queue rxue RUNNING 0:13 test_alma_pclean_v4-coordinator 1632 queue rxue RUNNING 0:10 test_alma_pclean_v4-1 1633 queue rxue RUNNING 0:10 test_alma_pclean_v4-2 ... 1647 queue rxue RUNNING 0:09 test_alma_pclean_v4-16 ``` The coordinator job appears first (job 1631). Worker jobs are auto-numbered (jobs 1632–1647, one per `nworkers`). ### 4. Override workdir ```bash pclean submit scripts/test_alma_pclean_v4.yaml \ --workdir /scratch/cubeimaging/run_01 ``` --- ## YAML Configuration Reference ### `cluster.slurm:` — Worker SLURM Settings These control each **worker** SLURM job (passed to `dask-jobqueue.SLURMCluster`): | Field | Default | Purpose | |-------|---------|---------| | `queue` | `null` | SLURM partition (`--partition`) | | `account` | `null` | SLURM account (`--account`) | | `walltime` | `04:00:00` | Per-worker wall time (`--time`) | | `job_mem` | `20GB` | Per-worker memory (`--mem`); must cover CASA RSS peak | | `cores_per_job` | `1` | CPUs per worker; 1 is sufficient (CASA imaging is single-threaded) | | `job_name` | `null` | Worker job name base (auto-suffixed: `-0`, `-1`, …) | | `job_extra_directives` | `[]` | Extra `#SBATCH` lines for workers | | `python` | `null` | Python executable path on compute nodes | | `local_directory` | `null` | Scratch directory on compute nodes | | `log_directory` | `logs` | Directory for worker stdout/stderr | | `job_script_prologue` | `[]` | Shell commands before worker starts (e.g. `module load`) | ### `cluster.submit:` — Coordinator Job Settings These control the **coordinator** sbatch script: | Field | Default | Purpose | |-------|---------|---------| | `workdir` | `null` | Working directory for imaging output | | `pixi_project_dir` | `null` | Root of pclean pixi project (auto-detected from config path) | | `pixi_env` | `forge` | Pixi environment to activate in the sbatch script | | `coordinator_mem` | `8G` | Memory for the coordinator (`--mem`) | | `coordinator_cpus` | `2` | CPUs for the coordinator (`--cpus-per-task`) | | `coordinator_walltime` | `24:00:00` | Wall time for the coordinator (`--time`) | | `coordinator_job_name` | `pclean-coordinator` | Job name (appears in `squeue`) | | `extra_sbatch` | `[]` | Additional `#SBATCH` directives | | `log_dir` | `null` | Log directory (fallback: `pixi_project_dir/logs`) | | `psrecord` | `true` | Wrap run in `psrecord` for resource profiling | ### `cluster:` — General Settings | Field | Default | Purpose | |-------|---------|---------| | `type` | `local` | Cluster backend: `local`, `slurm`, or `address` | | `parallel` | `false` | Enable parallel imaging | | `nworkers` | `null` | Number of workers (= SLURM jobs in slurm mode) | | `cube_chunksize` | `-1` | Channels per sub-cube (`-1` = auto: `nchan / nworkers`) | | `concat_mode` | `auto` | Sub-cube concatenation: `auto`, `paged`, `virtual`, or `movevirtual` | | `keep_subcubes` | `false` | Keep individual sub-cube images after concatenation | --- ## Example Config Walkthrough From [test_alma_pclean_v4.yaml](../scripts/test_alma_pclean_v4.yaml): ```yaml cluster: parallel: true nworkers: 16 # 16 SLURM worker jobs cube_chunksize: -1 # auto: 1000 ch / 16 = ~63 ch per worker concat_mode: paged keep_subcubes: false type: slurm # use SLURM backend slurm: queue: queue # SLURM partition name job_mem: 2GB # each worker gets 2 GB (128×128 images are small) job_name: test_alma_pclean_v4 python: /path/to/.pixi/envs/forge/bin/python submit: workdir: /path/to/output/ # images written here pixi_project_dir: /path/to/pclean/ # for pixi shell-hook activation pixi_env: forge coordinator_mem: 4G coordinator_cpus: 2 coordinator_walltime: '24:00:00' coordinator_job_name: test_alma_pclean_v4-coordinator psrecord: true ``` The output cube (IRC+10216, SPW 25, 1000 channels at 128×128): ![IRC+10216 SPW 25 cube](figures/irc-10216-spw25.png) ### Sizing Guidelines | Resource | Guideline | |----------|-----------| | `job_mem` | Must fit CASA's peak RSS for one sub-cube. Use `pclean --estimate-memory` or see [memory_estimation.md](memory_estimation.md). Small images (128²) need ~2 GB; large mosaics (8000²) may need 40+ GB. | | `coordinator_mem` | The coordinator does no imaging — 4–8 GB is usually sufficient for Dask bookkeeping and final concatenation. | | `nworkers` | Ideally `nchan` is evenly divisible. More workers = smaller sub-cubes = faster but more SLURM overhead. | | `walltime` | Set generously; each worker images `nchan/nworkers` channels. A 128² × 63-channel sub-cube with `niter=0` typically finishes in minutes. | --- ## What `pclean submit` Does Internally 1. **Loads config** from the YAML file. 2. **Generates sbatch script** (`/submit.sh`) with coordinator SLURM directives, pixi environment activation, and `python -m pclean` invocation (optionally wrapped in `psrecord`). 3. **Calls `sbatch`** to submit the coordinator job. 4. **Coordinator runs** on the allocated node: - Creates a `dask_jobqueue.SLURMCluster` with the `slurm:` settings. - Calls `cluster.scale(jobs=nworkers)` — each worker is a separate SLURM batch job. - Partitions the frequency axis into sub-cubes. - Dispatches sub-cube imaging tasks via Dask futures. - Waits for completion, concatenates results. - `SLURMCluster.close()` cancels all worker jobs. --- ## Cleanup If the coordinator is killed abnormally (e.g. OOM, walltime), orphan worker jobs may remain. Clean them up with: ```bash scancel --name=test_alma_pclean_v4 # cancel all workers by job_name scancel # cancel the coordinator itself ``` --- ## Related Documentation | Document | Topic | |----------|-------| | [parallelization.md](parallelization.md) | Cube vs. continuum parallel architecture, known limitations | | [code_structure.md](code_structure.md) | Module-level overview and design decisions | | [memory_estimation.md](memory_estimation.md) | RAM estimation heuristics for worker sizing | | [memory_management.md](memory_management.md) | Why Dask memory limits are disabled (`memory_limit='0'`) | | [dev_guide.md](dev_guide.md) | Development setup, testing, CI | | [config_current.md](config_current.md) | Full configuration reference | | [defaults.yaml](../src/pclean/configs/defaults.yaml) | Auto-generated default values for all config fields |