Skip to content

API Reference

The public API is three objects: create_pyramid builds a write plan, Pyramid holds it, and ZarrLayerVarConfig carries optional visualization hints. CoarseningMethod is the Literal["mean", "max", "min", "sum"] alias accepted by create_pyramid(method=...).

topozarr.coarsen.create_pyramid

create_pyramid(
    ds: Dataset,
    levels: int | None = None,
    *,
    factors: list[int] | None = None,
    x_dim: str = "x",
    y_dim: str = "y",
    method: CoarseningMethod = "mean",
    target_chunk_bytes: int = DEFAULT_CHUNK_BYTES,
    chunks_per_shard: ChunksPerShard | None = DEFAULT_CHUNKS_PER_SHARD,
    layer_hints: dict[str, ZarrLayerVarConfig] | None = None
) -> Pyramid

Build a multiscale Zarr pyramid plan from a georeferenced Dataset.

Exactly one of levels / factors must be given.

Parameters:

  • ds (Dataset) –

    Source dataset. Must have a CRS assigned via ds.proj.assign_crs.

  • levels (int | None, default: None ) –

    Total number of resolution levels, including the original. Level 0 is the original resolution; each subsequent level coarsens by 2× per spatial dimension (cumulative factors [1, 2, 4, ...]).

  • factors (list[int] | None, default: None ) –

    Explicit cumulative downsample factors per level, e.g. [1, 4, 16] for a sparse 4×-spaced pyramid. Must start at 1, be strictly increasing, and have each entry integer-divide the next. Mutually exclusive with levels.

  • x_dim (str, default: 'x' ) –

    Name of the x (longitude / easting) dimension.

  • y_dim (str, default: 'y' ) –

    Name of the y (latitude / northing) dimension.

  • method (CoarseningMethod, default: 'mean' ) –

    Spatial aggregation method for coarsening.

  • target_chunk_bytes (int, default: DEFAULT_CHUNK_BYTES ) –

    Target uncompressed size per chunk (default ~500 KB).

  • chunks_per_shard (ChunksPerShard | None, default: DEFAULT_CHUNKS_PER_SHARD ) –

    Number of chunks per shard along each spatial dimension (e.g. 4 → 4×4 = 16 chunks per shard, ~8 MB). Must be a power of 2 in the range 1–32. Pass None to disable sharding.

  • layer_hints (dict[str, ZarrLayerVarConfig] | None, default: None ) –

    Optional per-variable colormap / color-range hints written into the zarr-layer root metadata key.

Returns:

  • Pyramid

    A Pyramid write plan; call

  • Pyramid

    pyramid.write(store) to compute and write all levels.

Raises:

  • ValueError

    If ds has no CRS, chunks_per_shard is not a power of 2 in the range 1–32, or a spatial variable has more than 4 dimensions (topozarr-core kernel limit).

Examples:

import xarray as xr
import xproj  # registers the .proj accessor
from topozarr import create_pyramid

ds = xr.tutorial.open_dataset("air_temperature").drop_encoding()
ds = ds.proj.assign_crs(spatial_ref="EPSG:4326")

pyramid = create_pyramid(ds, levels=2, x_dim="lon", y_dim="lat")
pyramid.write("pyramid.zarr")

# sparse pyramid: native, 4x, 16x (skips the costly 2x level)
sparse = create_pyramid(ds, factors=[1, 4, 16], x_dim="lon", y_dim="lat")
sparse.write("sparse.zarr")

topozarr.pyramid.Pyramid dataclass

Pyramid(
    source: Dataset,
    level_templates: dict[int, Dataset],
    encoding: dict[str, Any],
    attrs: dict[str, Any],
    x_dim: str,
    y_dim: str,
    method: CoarseningMethod,
    factors: list[int] = list(),
    fill_values: dict[str, float | int | None] = dict(),
)

A write plan for a multiscale Zarr pyramid, returned by create_pyramid.

Attributes:

  • source (Dataset) –

    The original (level 0) dataset.

  • level_templates (dict[int, Dataset]) –

    Per-level datasets carrying real coordinates and attrs; spatial data variables are zero-cost placeholders with the correct shape/dtype (their data is computed during write).

  • encoding (dict[str, Any]) –

    Nested dict {path: {var: {"chunks": ..., "shards": ...}}}.

  • attrs (dict[str, Any]) –

    Root group metadata (multiscales / proj: / spatial: / zarr-layer).

as_datatree

as_datatree() -> DataTree

Return a lazy DataTree with all pyramid levels coarsened via xarray.

Each level is produced by chaining xarray.coarsen operations on the source dataset. If the source is Dask-backed, the returned tree is fully lazy — suitable for writing on a Dask distributed cluster or with icechunk. Use self.encoding (already shaped for DataTree.to_zarr) to apply the recommended chunks and shards:

dt = pyramid.as_datatree()
dt.to_zarr(store, zarr_format=3, consolidated=False,
           encoding=pyramid.encoding)

write

write(
    store: Any,
    *,
    mode: str = "w",
    max_workers: int | None = None,
    levels: list[int] | None = None,
    max_region_bytes: int = DEFAULT_MAX_REGION_BYTES,
    progress: bool = False,
    stats: bool = False,
    keep_levels_in_memory: bool | None = None,
    io: Literal["python", "rust"] = "python"
) -> dict[str, Any] | None

Compute and write pyramid levels to a Zarr store.

Level 0 is streamed region by region from the source dataset; each subsequent level is block-reduced from the previously written level, streaming shard-sized regions through the Rust kernel on a thread pool. Levels are written sequentially (each reads the previous one); variables within a level are processed in parallel on a shared pool. For bounded memory on large stores, open the source lazily (e.g. xr.open_zarr(store, chunks=None)).

Parameters:

  • store (Any) –

    Anything zarr-python accepts — a local path, ObjectStore, or icechunk session store.

  • mode (str, default: 'w' ) –

    Zarr open mode for the root group. Use "a" when writing a subset of levels so the root group and any pre-existing levels are preserved.

  • max_workers (int | None, default: None ) –

    Thread pool size for region processing. None derives a default from the CPU count and available memory (peak memory is roughly max_workers * 5 * region_bytes).

  • levels (list[int] | None, default: None ) –

    Subset of levels to write (e.g. [1, 2]). Defaults to all levels.

  • max_region_bytes (int, default: DEFAULT_MAX_REGION_BYTES ) –

    Memory budget per level-0 copy region. Regions are widened to cover whole source chunks when that fits the budget, so each source chunk is read once.

  • progress (bool, default: False ) –

    Show a tqdm progress bar over written regions (requires tqdm).

  • stats (bool, default: False ) –

    Collect and return per-level timing stats: region shapes, worker count, wall time, and cumulative per-region read/reduce/write seconds (summed across threads).

    With level pipelining active (keep_levels_in_memory=True or auto-enabled), level N's reduce_s captures fused-reduce time (reducing level-N blocks into the level-N+1 buffer) rather than the reduce of level N itself (which is zero when reading from memory). read_s = block_s - reduce_s remains the pure source-read time at every level.

  • keep_levels_in_memory (bool | None, default: None ) –

    Control level pipelining. None (default) auto-enables fusion when the higher levels fit in half the available RAM after accounting for the worker region budget. True forces fusion and raises MemoryError if the budget is exceeded. False disables fusion and always re-reads from the store.

  • io (Literal['python', 'rust'], default: 'python' ) –

    "python" (default) writes everything through zarr-python. "rust" encodes and stores spatial-variable regions natively in the topozarr-core kernel (a bundled Rust extension, no extra install) -- one shared connection pool, no per-region trip through zarr-python's sync bridge. Metadata, coords, and non-spatial variables still go through zarr-python. Supports local paths, s3:// URLs, LocalStore, and obstore-backed ObjectStore targets. This is unrelated to the optional zarrs codec pipeline (a separate zarr-python codec backend).

Examples: Write all levels to a local store:

```python
pyramid.write("pyramid.zarr")
```

Rewrite the coarsened levels, preserving level 0:

```python
pyramid.write("pyramid.zarr", mode="a", levels=[1, 2])
```

topozarr.metadata.ZarrLayerVarConfig dataclass

ZarrLayerVarConfig(
    clim: list[float] | None = None, colormap: str | None = None
)

Per-variable visualization hints for zarr-layer.

Attributes:

  • clim (list[float] | None) –

    Color range as [min, max].

  • colormap (str | None) –

    Colormap name (e.g. "blues").

Examples:

create_pyramid(
    ds,
    levels=2,
    layer_hints={"air": ZarrLayerVarConfig(colormap="blues", clim=[230, 310])},
)

Engine

Lower-level streaming drivers used by Pyramid.write; useful when writing custom pipelines on top of the kernel.

topozarr.engine.downsample_level

downsample_level(
    src: Array,
    dst: Array,
    *,
    stride: tuple[int, ...],
    method: str,
    fill_value: float | int | None = None,
    skipna: bool = True,
    max_workers: int | None = None,
    executor: ThreadPoolExecutor | None = None,
    on_region: Callable[[], None] | None = None,
    timer: RegionTimer | None = None,
    skip_empty: bool = True,
    write_region: Callable[[Region, ndarray], None] | None = None
) -> list[Future[None]]

Block-reduce src into dst by integer stride per axis.

Streams shard-sized output regions through topozarr_core.block_reduce with reads/writes via zarr-python. Matches xarray.coarsen(boundary="trim") shape and skipna semantics. Arrays are limited to 4 dimensions (kernel limit).

topozarr.engine.copy_array

copy_array(
    values: Any,
    dst: Array,
    *,
    source_chunks: tuple[int, ...] | None = None,
    max_region_bytes: int = DEFAULT_MAX_REGION_BYTES,
    max_workers: int | None = None,
    executor: ThreadPoolExecutor | None = None,
    on_region: Callable[[], None] | None = None,
    on_block: Callable[[Region, ndarray], None] | None = None,
    timer: RegionTimer | None = None,
    skip_empty: bool = True,
    write_region: Callable[[Region, ndarray], None] | None = None
) -> list[Future[None]]

Write a region-indexable array into dst region by region.

values may be a numpy array or any lazy array supporting tuple-of-slices indexing (e.g. xr.Variable backed by zarr/icechunk); each region is materialized individually, keeping peak memory at max_workers x region_size. Pass source_chunks to widen regions to the source chunk grid so each source chunk is decoded once.

For in-memory values (np.ndarray) the contiguity copy is skipped: slicing returns a (possibly strided) view, and both the reduce kernel and zarr setitem accept strided input, so copying would only waste memory.

topozarr.engine.default_max_workers

default_max_workers(region_bytes: int) -> int

Thread count bounded by CPU count and available memory.

Peak memory is roughly workers * 5 * region_bytes, so workers are capped at half the available RAM divided by that per-region footprint.