API Reference
The public API is three objects: create_pyramid builds a write plan, Pyramid holds it, and ZarrLayerVarConfig carries optional visualization hints. CoarseningMethod is the Literal["mean", "max", "min", "sum"] alias accepted by create_pyramid(method=...).
topozarr.coarsen.create_pyramid
create_pyramid(
ds: Dataset,
levels: int | None = None,
*,
factors: list[int] | None = None,
x_dim: str = "x",
y_dim: str = "y",
method: CoarseningMethod = "mean",
target_chunk_bytes: int = DEFAULT_CHUNK_BYTES,
chunks_per_shard: ChunksPerShard | None = DEFAULT_CHUNKS_PER_SHARD,
layer_hints: dict[str, ZarrLayerVarConfig] | None = None
) -> Pyramid
Build a multiscale Zarr pyramid plan from a georeferenced Dataset.
Exactly one of levels / factors must be given.
Parameters:
-
ds(Dataset) –Source dataset. Must have a CRS assigned via
ds.proj.assign_crs. -
levels(int | None, default:None) –Total number of resolution levels, including the original. Level
0is the original resolution; each subsequent level coarsens by 2× per spatial dimension (cumulative factors[1, 2, 4, ...]). -
factors(list[int] | None, default:None) –Explicit cumulative downsample factors per level, e.g.
[1, 4, 16]for a sparse 4×-spaced pyramid. Must start at 1, be strictly increasing, and have each entry integer-divide the next. Mutually exclusive withlevels. -
x_dim(str, default:'x') –Name of the x (longitude / easting) dimension.
-
y_dim(str, default:'y') –Name of the y (latitude / northing) dimension.
-
method(CoarseningMethod, default:'mean') –Spatial aggregation method for coarsening.
-
target_chunk_bytes(int, default:DEFAULT_CHUNK_BYTES) –Target uncompressed size per chunk (default ~500 KB).
-
chunks_per_shard(ChunksPerShard | None, default:DEFAULT_CHUNKS_PER_SHARD) –Number of chunks per shard along each spatial dimension (e.g.
4→ 4×4 = 16 chunks per shard, ~8 MB). Must be a power of 2 in the range 1–32. PassNoneto disable sharding. -
layer_hints(dict[str, ZarrLayerVarConfig] | None, default:None) –Optional per-variable colormap / color-range hints written into the
zarr-layerroot metadata key.
Returns:
-
Pyramid–A Pyramid write plan; call
-
Pyramid–pyramid.write(store)to compute and write all levels.
Raises:
-
ValueError–If
dshas no CRS,chunks_per_shardis not a power of 2 in the range 1–32, or a spatial variable has more than 4 dimensions (topozarr-core kernel limit).
Examples:
import xarray as xr
import xproj # registers the .proj accessor
from topozarr import create_pyramid
ds = xr.tutorial.open_dataset("air_temperature").drop_encoding()
ds = ds.proj.assign_crs(spatial_ref="EPSG:4326")
pyramid = create_pyramid(ds, levels=2, x_dim="lon", y_dim="lat")
pyramid.write("pyramid.zarr")
# sparse pyramid: native, 4x, 16x (skips the costly 2x level)
sparse = create_pyramid(ds, factors=[1, 4, 16], x_dim="lon", y_dim="lat")
sparse.write("sparse.zarr")
topozarr.pyramid.Pyramid
dataclass
Pyramid(
source: Dataset,
level_templates: dict[int, Dataset],
encoding: dict[str, Any],
attrs: dict[str, Any],
x_dim: str,
y_dim: str,
method: CoarseningMethod,
factors: list[int] = list(),
fill_values: dict[str, float | int | None] = dict(),
)
A write plan for a multiscale Zarr pyramid, returned by create_pyramid.
Attributes:
-
source(Dataset) –The original (level 0) dataset.
-
level_templates(dict[int, Dataset]) –Per-level datasets carrying real coordinates and attrs; spatial data variables are zero-cost placeholders with the correct shape/dtype (their data is computed during write).
-
encoding(dict[str, Any]) –Nested dict
{path: {var: {"chunks": ..., "shards": ...}}}. -
attrs(dict[str, Any]) –Root group metadata (multiscales / proj: / spatial: / zarr-layer).
as_datatree
Return a lazy DataTree with all pyramid levels coarsened via xarray.
Each level is produced by chaining xarray.coarsen operations on the
source dataset. If the source is Dask-backed, the returned tree is fully
lazy — suitable for writing on a Dask distributed cluster or with
icechunk. Use self.encoding (already shaped for DataTree.to_zarr)
to apply the recommended chunks and shards:
write
write(
store: Any,
*,
mode: str = "w",
max_workers: int | None = None,
levels: list[int] | None = None,
max_region_bytes: int = DEFAULT_MAX_REGION_BYTES,
progress: bool = False,
stats: bool = False,
keep_levels_in_memory: bool | None = None,
io: Literal["python", "rust"] = "python"
) -> dict[str, Any] | None
Compute and write pyramid levels to a Zarr store.
Level 0 is streamed region by region from the source dataset; each
subsequent level is block-reduced from the previously written level,
streaming shard-sized regions through the Rust kernel on a thread
pool. Levels are written sequentially (each reads the previous one);
variables within a level are processed in parallel on a shared pool.
For bounded memory on large stores, open the source lazily (e.g.
xr.open_zarr(store, chunks=None)).
Parameters:
-
store(Any) –Anything zarr-python accepts — a local path,
ObjectStore, or icechunk session store. -
mode(str, default:'w') –Zarr open mode for the root group. Use
"a"when writing a subset of levels so the root group and any pre-existing levels are preserved. -
max_workers(int | None, default:None) –Thread pool size for region processing.
Nonederives a default from the CPU count and available memory (peak memory is roughlymax_workers * 5 * region_bytes). -
levels(list[int] | None, default:None) –Subset of levels to write (e.g.
[1, 2]). Defaults to all levels. -
max_region_bytes(int, default:DEFAULT_MAX_REGION_BYTES) –Memory budget per level-0 copy region. Regions are widened to cover whole source chunks when that fits the budget, so each source chunk is read once.
-
progress(bool, default:False) –Show a tqdm progress bar over written regions (requires
tqdm). -
stats(bool, default:False) –Collect and return per-level timing stats: region shapes, worker count, wall time, and cumulative per-region read/reduce/write seconds (summed across threads).
With level pipelining active (
keep_levels_in_memory=Trueor auto-enabled), level N'sreduce_scaptures fused-reduce time (reducing level-N blocks into the level-N+1 buffer) rather than the reduce of level N itself (which is zero when reading from memory).read_s = block_s - reduce_sremains the pure source-read time at every level. -
keep_levels_in_memory(bool | None, default:None) –Control level pipelining.
None(default) auto-enables fusion when the higher levels fit in half the available RAM after accounting for the worker region budget.Trueforces fusion and raisesMemoryErrorif the budget is exceeded.Falsedisables fusion and always re-reads from the store. -
io(Literal['python', 'rust'], default:'python') –"python"(default) writes everything through zarr-python."rust"encodes and stores spatial-variable regions natively in thetopozarr-corekernel (a bundled Rust extension, no extra install) -- one shared connection pool, no per-region trip through zarr-python's sync bridge. Metadata, coords, and non-spatial variables still go through zarr-python. Supports local paths,s3://URLs,LocalStore, and obstore-backedObjectStoretargets. This is unrelated to the optionalzarrscodec pipeline (a separate zarr-python codec backend).
Examples: Write all levels to a local store:
```python
pyramid.write("pyramid.zarr")
```
Rewrite the coarsened levels, preserving level 0:
```python
pyramid.write("pyramid.zarr", mode="a", levels=[1, 2])
```
topozarr.metadata.ZarrLayerVarConfig
dataclass
Engine
Lower-level streaming drivers used by Pyramid.write; useful when writing custom pipelines on top of the kernel.
topozarr.engine.downsample_level
downsample_level(
src: Array,
dst: Array,
*,
stride: tuple[int, ...],
method: str,
fill_value: float | int | None = None,
skipna: bool = True,
max_workers: int | None = None,
executor: ThreadPoolExecutor | None = None,
on_region: Callable[[], None] | None = None,
timer: RegionTimer | None = None,
skip_empty: bool = True,
write_region: Callable[[Region, ndarray], None] | None = None
) -> list[Future[None]]
Block-reduce src into dst by integer stride per axis.
Streams shard-sized output regions through topozarr_core.block_reduce
with reads/writes via zarr-python. Matches
xarray.coarsen(boundary="trim") shape and skipna semantics.
Arrays are limited to 4 dimensions (kernel limit).
topozarr.engine.copy_array
copy_array(
values: Any,
dst: Array,
*,
source_chunks: tuple[int, ...] | None = None,
max_region_bytes: int = DEFAULT_MAX_REGION_BYTES,
max_workers: int | None = None,
executor: ThreadPoolExecutor | None = None,
on_region: Callable[[], None] | None = None,
on_block: Callable[[Region, ndarray], None] | None = None,
timer: RegionTimer | None = None,
skip_empty: bool = True,
write_region: Callable[[Region, ndarray], None] | None = None
) -> list[Future[None]]
Write a region-indexable array into dst region by region.
values may be a numpy array or any lazy array supporting
tuple-of-slices indexing (e.g. xr.Variable backed by zarr/icechunk);
each region is materialized individually, keeping peak memory at
max_workers x region_size. Pass source_chunks to widen regions to
the source chunk grid so each source chunk is decoded once.
For in-memory values (np.ndarray) the contiguity copy is skipped:
slicing returns a (possibly strided) view, and both the reduce kernel and
zarr setitem accept strided input, so copying would only waste memory.