Converting a MeasurementSet to ADIOS2¶
convert_ms_to_adios2 rewrites the heavy data columns of a
MeasurementSet (MS) so they are stored through the Adios2StMan storage
manager instead of the default StandardStMan / TiledShapeStMan.
Why a conversion step is needed¶
Even when Adios2StMan is compiled into the casatools build (see
check_adios2.md), CASA will not use it automatically.
The storage manager for each column is determined at table-creation time
and recorded in the table’s dminfo. To benchmark the ADIOS2 I/O path
one must explicitly rewrite the MS with the new manager before
running pclean.
How it works¶
The source MS is opened and its
dminfodictionary is read.For every data manager that handles one of the target columns, the column is moved out of that manager and consolidated into a single new
Adios2StManentry.A deep copy with
valuecopy=Trueis performed. This forces the casacore Table Data System to physically read every cell through the old manager and rewrite it through the ADIOS2 C++ backend, rather than simply copying the underlying files. The C++deepCopystreams data row-by-row internally, so it does not load the full table into Python memory.Sub-tables (
ANTENNA,FIELD,SPECTRAL_WINDOW, etc.) are left untouched — their I/O footprint is negligible.
Command-line usage¶
python -m pclean.utils.convert_adios2 input.ms output_adios2.ms
Options¶
Flag |
Description |
|---|---|
|
Remove |
|
Columns to rebind (default: see below). |
|
ADIOS2 engine type (default: |
|
ADIOS2 write-buffer cap (BP4 only, e.g. |
|
User-supplied ADIOS2 XML config (overrides |
Default target columns:
DATA CORRECTED_DATA MODEL_DATA FLAG WEIGHT SIGMA
Examples¶
Rewrite all default columns:
python -m pclean.utils.convert_adios2 uid___A002.ms uid___A002_adios2.ms
Overwrite an existing output and rebind only DATA and FLAG:
python -m pclean.utils.convert_adios2 uid___A002.ms uid___A002_adios2.ms \
--overwrite --columns DATA FLAG
Python API¶
from pclean.utils.convert_adios2 import convert_ms_to_adios2
convert_ms_to_adios2('input.ms', 'input_adios2.ms')
Parameters¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the source MeasurementSet. |
|
|
(required) |
Destination path for the ADIOS2-backed copy. |
|
|
See default list above |
Columns to rebind to Adios2StMan. |
|
|
|
Remove |
|
|
|
ADIOS2 engine type (see buffer notes below). |
|
|
|
ADIOS2 engine parameters (see below). |
|
|
|
Path to user-supplied ADIOS2 XML config file. |
Return value¶
The output_ms path on success.
Exceptions¶
Exception |
When |
|---|---|
|
|
|
|
|
None of the |
Prerequisites¶
The openmpi variant of casatools must be installed for Adios2StMan
to be available. Verify with:
python -m pclean.utils.check_adios2
See check_adios2.md for details.
Appendix: Adios2StMan copy constraints and memory¶
Problem¶
For large MeasurementSets (multi-GB visibility columns), the conversion can consume significant memory. Reducing peak memory through row-level chunking was investigated but is not feasible with Adios2StMan.
Approaches tried and why they fail¶
Approach |
Result |
|---|---|
|
SIGABRT — |
Incremental |
Same |
|
First chunk succeeds; subsequent |
Why tb.copy() is the only working path¶
Adios2StMan requires cell shapes to be established through casacore’s
internal Table::deepCopy code path. Manual row-level writes bypass
this path, and the ADIOS2 engine does not support reopening a table for
append after the initial write session is closed.
Casacore’s C++ deepCopy with valuecopy=True already streams data
row-by-row internally — it does not load the entire table into
Python memory. The peak memory footprint during conversion is dominated
by ADIOS2’s internal write buffers rather than Python-side data.
Controlling ADIOS2 write-buffer memory¶
ADIOS2 does not expose an environment variable for buffer control.
The ADIOS2 BP engine accumulates all Put() data within a single step
— EndStep() / Close() only run in the Adios2StMan destructor, after
the entire deepCopy completes. Without explicit configuration the
write buffers grow proportionally to the table size.
Critically, the default engine matters:
Engine |
|
|
Notes |
|---|---|---|---|
BP4 |
respected — triggers intermediate flush to disk |
N/A |
Recommended for memory control. |
BP5 |
ignored |
respected |
Default in recent ADIOS2 builds. |
Because Adios2StMan’s C++ constructor picks the ADIOS2 default engine
(usually BP5) when no engine type is specified, passing
MaxBufferSize via ENGINEPARAMS in the dminfo SPEC alone was
ineffective — BP5 ignored it entirely.
An XML config file approach was also attempted (writing a temporary
file and passing its path via the XMLFILE dminfo SPEC field), but
ADIOS2’s XML parser crashed with std::invalid_argument: stoul in
some builds — so that path was abandoned.
convert_ms_to_adios2 now:
Defaults to
BP4(which respectsMaxBufferSize).Sets
ENGINETYPEandENGINEPARAMSdirectly in the dminfoSPECrecord. Casacore’sAdios2StMan::makeObjectreads these fields and callsIO::SetEngine()/IO::SetParameters()via the C++ API — bypassing the XML parser entirely.
# Cap write buffers at 2 GB with explicit BP4 engine
python -m pclean.utils.convert_adios2 input.ms output.ms \
--max-buffer-size 2Gb
# Python API
convert_ms_to_adios2(
'input.ms', 'output.ms',
engine_params={'MaxBufferSize': '2Gb'},
)
For full control, supply a custom XML:
Caveat: ADIOS2’s XML parser crashed with
std::invalid_argument: stoulin some builds. Test the XML file on a small MS before running a large conversion. TheENGINETYPE/ENGINEPARAMSapproach (default) avoids the XML parser entirely and is preferred.
<?xml version="1.0"?>
<adios-config>
<io name="Adios2StMan">
<engine type="BP4">
<parameter key="MaxBufferSize">2Gb</parameter>
<parameter key="InitialBufferSize">256Mb</parameter>
</engine>
</io>
</adios-config>
python -m pclean.utils.convert_adios2 input.ms output.ms \
--adios2-xml my_adios2.xml
Useful BP engine parameters for memory control:
Parameter |
Default (BP4 / BP5) |
Description |
|---|---|---|
|
unlimited |
Flush to disk when exceeded (BP4 only). |
|
16 KB / 128 MB |
Starting allocation size. |
|
1.05 / — |
Growth multiplier (BP4). |
|
— / 128 MB |
Per-chunk allocation (BP5). |
Workaround for very large datasets¶
If the ADIOS2 buffer memory is a concern for extremely large datasets,
split the MS into smaller partitions first (e.g. with
casatasks.mstransform or casatasks.partition), convert each
partition individually, then concatenate the results.