API reference¶
This page provides a structured, auto-generated reference for the OCR Python package using mkdocstrings. Each section links to the corresponding module(s) and surfaces docstrings, type hints, and signatures.
Package overview¶
High-level package entry points and public exports.
ocr ¶
Core modules¶
Configuration¶
Configuration models for storage, chunking, Coiled, and processing settings.
ocr.config ¶
Classes¶
ChunkingConfig ¶
Bases: BaseSettings
Attributes¶
extent_as_tuple_5070
cached
property
¶
Get extent in EPSG:5070 projection as tuple (xmin, xmax, ymin, ymax)
valid_region_ids
cached
property
¶
valid_region_ids: list
Generate valid region IDs by checking which regions contain non-null data.
Returns:
-
list–List of valid region IDs (e.g., 'y1_x3', 'y2_x4', etc.)
Functions¶
bbox_from_wgs84 ¶
chunk_id_to_slice ¶
Convert a chunk ID (iy, ix) to corresponding array slices
Parameters:
-
chunk_id(tuple) –The chunk identifier as a tuple (iy, ix) where: - iy is the index along y-dimension - ix is the index along x-dimension
Returns:
chunks_to_slices ¶
get_chunk_mapping ¶
Returns a dict of region_ids and their corresponding chunk_indexes.
Returns:
-
chunk_mapping(dict) –Dictionary with region IDs as keys and corresponding chunk indexes (iy, ix) as values
get_chunks_for_bbox ¶
Find all chunks that intersect with the given bounding box
Parameters:
-
bbox(BoundingBox or tuple) –Bounding box to check for intersection. If tuple, format is (minx, miny, maxx, maxy)
Returns:
-
list of tuples–List of (iy, ix) tuples identifying the intersecting chunks
index_to_coords ¶
plot_all_chunks ¶
plot_all_chunks(color_by_size: bool = False) -> None
Plot all data chunks across the entire CONUS with their indices as labels
Parameters:
-
color_by_size(bool, default:False) –If True, color chunks based on their size (useful to identify irregularities)
region_id_chunk_lookup ¶
region_id_slice_lookup ¶
given a region_id, ex: 'y5_x14, returns the corresponding x,y slices. ex: (slice(np.int64(30000), np.int64(36000), None), slice(np.int64(85500), np.int64(90000), None))
Parameters:
-
region_id(str) –The region_id for chunk_id lookup.
Returns:
region_id_to_latlon_slices ¶
Get latitude and longitude slices from region_id
Returns (lat_slice, lon_slice) where lat_slice.start < lat_slice.stop and lon_slice.start < lon_slice.stop (lower-left origin, lat ascending).
visualize_chunks_on_conus ¶
visualize_chunks_on_conus(
chunks: list[tuple[int, int]] | None = None,
color_by_size: bool = False,
highlight_chunks: list[tuple[int, int]] | None = None,
include_all_chunks: bool = False,
) -> None
Visualize specified chunks on CONUS map
Parameters:
-
chunks(list of tuples, default:None) –List of (iy, ix) tuples specifying chunks to visualize If None, will show all chunks
-
color_by_size(bool, default:False) –If True, color chunks based on their size
-
highlight_chunks(list of tuples, default:None) –List of (iy, ix) tuples specifying chunks to highlight
-
include_all_chunks(bool, default:False) –If True, show all chunks in background with low opacity
IcechunkConfig ¶
Bases: BaseSettings
Configuration for icechunk processing.
Attributes¶
Functions¶
commit_messages_ancestry ¶
Get the commit messages ancestry for the icechunk repository.
insert_region_uncooperative ¶
model_post_init ¶
Post-initialization to set up prefixes and URIs based on environment.
pretty_paths ¶
Pretty print key IcechunkConfig paths and URIs.
This version touches cached properties (e.g., uri, storage) to surface real configuration and types.
processed_regions ¶
Get a list of region IDs that have already been processed.
repo_and_session ¶
Open an icechunk repository and return the session.
OCRConfig ¶
Bases: BaseSettings
Configuration settings for OCR processing.
Functions¶
pretty_paths ¶
Pretty print key OCRConfig paths and URIs.
This method intentionally touches cached properties that create directories (e.g., via mkdir) so you can verify real locations.
resolve_region_ids ¶
resolve_region_ids(
provided_region_ids: set[str], *, allow_all_processed: bool = False
) -> RegionIDStatus
Validate provided region IDs against valid + processed sets.
Parameters:
-
provided_region_ids(set[str]) –The set of region IDs to validate.
-
allow_all_processed(bool, default:False) –If True, don't raise an error when all regions are already processed. This is useful for production reruns where you want to regenerate vector outputs even if icechunk regions are complete. Default is False.
Returns:
-
RegionIDStatus–Status object with validation results.
Raises:
-
ValueError–If no valid unprocessed region IDs remain and allow_all_processed is False.
select_region_ids ¶
select_region_ids(
region_ids: list[str] | None,
*,
all_region_ids: bool = False,
allow_all_processed: bool = False,
) -> RegionIDStatus
Helper to pick the effective set of region IDs (all or user-provided) and return the validated status object.
Parameters:
-
region_ids(list[str] | None) –User-provided region IDs to process.
-
all_region_ids(bool, default:False) –If True, use all valid region IDs instead of user-provided ones. Default is False.
-
allow_all_processed(bool, default:False) –If True, don't raise an error when all regions are already processed. Passed through to resolve_region_ids. Default is False.
Returns:
-
RegionIDStatus–Status object with validation results.
PyramidConfig ¶
VectorConfig ¶
Bases: BaseSettings
Configuration for vector data processing.
Attributes¶
block_summary_stats_uri
cached
property
¶
URI for the block summary statistics file.
counties_summary_stats_uri
cached
property
¶
URI for the counties summary statistics file.
tracts_summary_stats_uri
cached
property
¶
URI for the tracts summary statistics file.
Functions¶
model_post_init ¶
Post-initialization to set up prefixes and URIs based on environment.
pretty_paths ¶
Pretty print key VectorConfig paths and URIs.
This method intentionally touches cached properties that create directories (e.g., via mkdir) so you can verify real locations.
upath_delete ¶
Use UPath to handle deletion in a cloud-agnostic way
Functions¶
Type definitions¶
Strongly typed enums for environment, platform, and risk types.
Data access¶
Datasets¶
Dataset and Catalog abstractions for Zarr and GeoParquet on S3/local storage.
ocr.datasets ¶
Classes¶
Catalog ¶
Bases: BaseModel
Base class for datasets catalog.
Functions¶
get_dataset ¶
get_dataset(
name: str,
version: str | None = None,
*,
case_sensitive: bool = True,
latest: bool = False,
) -> Dataset
Get a dataset by name and optionally version.
Parameters:
-
name(str) –Name of the dataset to retrieve
-
version(str, default:None) –Specific version of the dataset. If not provided, returns the dataset if only one version exists, or raises an error if multiple versions exist, unless get_latest=True.
-
case_sensitive(bool, default:True) –Whether to match dataset names case-sensitively
-
latest(bool, default:False) –If True and version=None, returns the latest version instead of raising an error when multiple versions exist
Returns:
-
Dataset–The matched dataset
Raises:
-
ValueError–If multiple versions exist and version is not specified (and latest=False)
-
KeyError–If no matching dataset is found
Examples:
Dataset ¶
Bases: BaseModel
Base class for datasets.
Functions¶
query_geoparquet ¶
query_geoparquet(
query: str | None = None, *, install_extensions: bool = True
) -> DuckDBPyRelation
Query a geoparquet file using DuckDB.
Parameters:
-
query(str, default:None) –SQL query to execute. If not provided, returns all data.
-
install_extensions(bool, default:True) –Whether to install and load the spatial and httpfs extensions.
Returns:
-
DuckDBPyRelation–Result of the DuckDB query.
Raises:
-
ValueError–If dataset is not in 'geoparquet' format.
Example
Example of querying buildings with a converted geometry column:
buildings = catalog.get_dataset('conus-overture-buildings', 'v2025-03-19.1') result = buildings.query_geoparquet(""" ... SELECT ... id, ... roof_material, ... geometry ... FROM read_parquet('{s3_path}') ... WHERE roof_material = 'concrete' ... """)
Then convert to GeoDataFrame¶
gdf = buildings.to_geopandas(""" ... SELECT ... id, ... roof_material, ... geometry ... FROM read_parquet('{s3_path}') ... WHERE roof_material = 'concrete' ... """)
to_geopandas ¶
to_geopandas(
query: str | None = None,
geometry_column='geometry',
crs: str = 'EPSG:4326',
target_crs: str | None = None,
**kwargs,
) -> GeoDataFrame
Convert query results to a GeoPandas GeoDataFrame.
Parameters:
-
query(str, default:None) –SQL query to execute. If not provided, returns all data.
-
geometry_column(str, default:'geometry') –The name of the geometry column in the query result.
-
crs(str, default:'EPSG:4326') –The coordinate reference system to use for the geometries.
-
target_crs(str, default:None) –The target coordinate reference system to convert the geometries to.
-
**kwargs(dict, default:{}) –Additional keyword arguments passed to
query_geoparquet.
Returns:
-
GeoDataFrame–A GeoPandas GeoDataFrame containing the queried data with geometries.
Raises:
-
ValueError–If dataset is not in 'geoparquet' format or if the geometry column is not found.
Example
Example of converting buildings to GeoPandas GeoDataFrame - no need for ST_AsText():
buildings = catalog.get_dataset('conus-overture-buildings', 'v2025-03-19.1') gdf = buildings.to_geopandas(""" ... SELECT ... id, ... roof_material, ... geometry ... FROM read_parquet('{s3_path}') ... WHERE roof_material = 'concrete' ... """) gdf.head()
to_xarray ¶
to_xarray(
*,
is_icechunk: bool | None = None,
xarray_open_kwargs: dict | None = None,
xarray_storage_options: dict | None = None,
) -> Dataset
Convert the dataset to an xarray.Dataset.
Parameters:
-
is_icechunk(bool | None, default:None) –Whether to use icechunk to access the data. - If True: only try using icechunk - If None: try icechunk first, fall back to direct S3 access if it fails - If False: only use direct S3 access
-
xarray_open_kwargs(dict, default:None) –Additional keyword arguments to pass to xarray.open_dataset.
-
xarray_storage_options(dict, default:None) –Storage options for S3 access when not using icechunk.
Returns:
-
Dataset–The opened dataset.
Raises:
-
ValueError–If the dataset is not in 'zarr' format.
-
FileNotFoundError–If the dataset cannot be found or accessed.
CONUS404 helpers¶
Load CONUS404 variables, compute relative humidity, wind rotation and diagnostics. Geographic selection utilities (point/bbox) with CRS-aware transforms.
ocr.conus404 ¶
Functions¶
compute_relative_humidity ¶
compute_wind_speed_and_direction ¶
Derive hourly wind speed (m/s) and direction (degrees from) using xclim.
Parameters:
-
u10(DataArray) –U component of wind at 10 m (m/s).
-
v10(DataArray) –V component of wind at 10 m (m/s).
Returns:
-
wind_ds(Dataset) –Dataset containing wind speed ('sfcWind') and wind direction ('sfcWindfromdir').
load_conus404 ¶
Utilities¶
General utilities¶
Helpers for DuckDB (extension loading, S3 secrets), vector sampling, and file transfer.
ocr.utils ¶
Functions¶
apply_s3_creds ¶
Register AWS credentials as a DuckDB SECRET on the given connection.
Parameters:
-
region(str, default:'us-west-2') –AWS region used for S3 access.
-
con(DuckDBPyConnection | None, default:None) –Connection to apply credentials to. If None, uses duckdb's default connection (duckdb.sql), preserving prior behavior.
bbox_tuple_from_xarray_extent ¶
bbox_tuple_from_xarray_extent(
ds: Dataset, x_name: str = 'x', y_name: str = 'y'
) -> tuple[float, float, float, float]
Creates a bounding box from an Xarray Dataset extent.
Parameters:
-
ds(Dataset) –Input Xarray Dataset
-
x_name(str, default:'x') –Name of x coordinate, by default 'x'
-
y_name(str, default:'y') –Name of y coordinate, by default 'y'
Returns:
-
tuple–Bounding box tuple in the form: (x_min, y_min, x_max, y_max)
copy_or_upload ¶
copy_or_upload(
src: UPath,
dest: UPath,
overwrite: bool = True,
chunk_size: int = 16 * 1024 * 1024,
) -> None
Copy a single file from src to dest using UPath/fsspec. - Uses server-side copy if available on the same filesystem (e.g., s3->s3). - Falls back to streaming copy otherwise. - Creates destination parent directories when supported.
Parameters:
-
src(UPath) –Source UPath
-
dest(UPath) –Destination UPath (file path; if pointing to a directory-like path, src.name is appended)
-
overwrite(bool, default:True) –If False, raises if dest exists
-
chunk_size(int, default:16 * 1024 * 1024) –Buffer size for streaming copies
Returns:
-
None–
extract_points ¶
extract_points(gdf: GeoDataFrame, da: DataArray) -> DataArray
Extract/sample points from a GeoDataFrame to an Xarray DataArray.
Parameters:
-
gdf(GeoDataFrame) –Input geopandas GeoDataFrame. Geometry should be points
-
da(DataArray) –Input Xarray DataArray
Returns:
-
DataArray–DataArray with geometry sampled
Notes
UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
The relatively small size of a building footprint should account for a very small shift in the centroid when calculating from EPSG:4326 vs EPSG:5070.
TODO: Should/can this be a DataArray for typing
geo_sel ¶
geo_sel(
ds: Dataset,
*,
lon: float | None = None,
lat: float | None = None,
bbox: tuple[float, float, float, float] | None = None,
method: str = 'nearest',
tolerance: float | None = None,
crs_wkt: str | None = None,
)
Geographic selection helper.
Exactly one of: - (lon AND lat) - (lons AND lats) - bbox=(west, south, east, north)
Parameters:
-
ds(Dataset) –Input dataset with x, y coordinates and a valid 'crs' variable with WKT
-
lon(float, default:None) –Longitude of point to select, by default None
-
lat(float, default:None) –Latitude of point to select, by default None
-
bbox(tuple, default:None) –Bounding box to select (west, south, east, north), by default None
-
method(str, default:'nearest') –Method to use for point selection, by default 'nearest'
-
tolerance(float, default:None) –Tolerance (in units of the dataset's CRS) for point selection, by default None
-
crs_wkt(str, default:None) –WKT string for the dataset's CRS. If None, attempts to read from ds.crs.attrs['crs_wkt'].
Returns:
-
Dataset–Single point: time dimension only Multiple points: adds 'point' dimension BBox: retains y, x subset
get_temp_dir ¶
get_temp_dir() -> Path | None
Get optimal temporary directory path for the current environment.
Returns the current working directory if running in /scratch (e.g., on Coiled clusters), otherwise returns None to use the system default temp directory.
On Coiled clusters, /scratch is bind-mounted directly to the NVMe disk, avoiding Docker overlay filesystem overhead and providing better I/O performance and more available space compared to /tmp which sits on the Docker overlay.
Returns:
-
Path | None–Current working directory if in /scratch, None otherwise (uses system default).
Examples:
install_load_extensions ¶
install_load_extensions(
aws: bool = True,
spatial: bool = True,
httpfs: bool = True,
con: Any | None = None,
) -> None
Installs and applies duckdb extensions.
Parameters:
-
aws(bool, default:True) –Install and load AWS extension, by default True
-
spatial(bool, default:True) –Install and load SPATIAL extension, by default True
-
httpfs(bool, default:True) –Install and load HTTPFS extension, by default True
-
con(DuckDBPyConnection | None, default:None) –Connection to apply extensions to. If None, uses duckdb's default
Testing utilities¶
Snapshot testing extensions for xarray and GeoPandas.
ocr.testing ¶
Classes¶
GeoDataFrameSnapshotExtension ¶
Bases: SingleFileSnapshotExtension
Snapshot extension for GeoPandas GeoDataFrames stored as parquet.
Supports both local and remote (S3) storage via environment variable configuration: - SNAPSHOT_STORAGE_PATH: Base path for snapshots (local or s3://bucket/path) Default: s3://carbonplan-scratch/snapshots (configured in tests/conftest.py)
Examples: # Use default S3 storage (no env var needed) pytest tests/test_snapshot.py --snapshot-update
# Override with local storage
SNAPSHOT_STORAGE_PATH=tests/__snapshots__ pytest tests/
# Override with different S3 bucket
SNAPSHOT_STORAGE_PATH=s3://my-bucket/snapshots pytest tests/
Functions¶
diff_lines ¶
Generate diff lines for test output.
dirname
classmethod
¶
dirname(*, test_location: PyTestLocation) -> str
Return the directory for storing snapshots.
get_location
classmethod
¶
get_location(*, test_location: PyTestLocation, index: SnapshotIndex = 0) -> str
Get the full snapshot location path.
Override to properly handle S3 paths using upath instead of os.path.join.
get_snapshot_name
classmethod
¶
get_snapshot_name(
*, test_location: PyTestLocation, index: SnapshotIndex = 0
) -> str
Generate snapshot name based on test name.
Sanitizes the test name to replace problematic characters (e.g., brackets from parametrized tests) with underscores for valid file paths.
matches ¶
Check if serialized data matches snapshot using GeoDataFrame comparison.
read_snapshot_data_from_location ¶
read_snapshot_data_from_location(
*, snapshot_location: str, snapshot_name: str, session_id: str
) -> GeoDataFrame | None
Read parquet snapshot from disk.
serialize ¶
Validate that data is a GeoDataFrame. Returns the data unchanged.
write_snapshot_collection
classmethod
¶
Write snapshot collection to parquet format (local or remote).
XarraySnapshotExtension ¶
Bases: SingleFileSnapshotExtension
Snapshot extension for xarray DataArrays and Datasets stored as zarr.
Supports both local and remote (S3) storage via environment variable configuration: - SNAPSHOT_STORAGE_PATH: Base path for snapshots (local or s3://bucket/path) Default: s3://carbonplan-scratch/snapshots (configured in tests/conftest.py)
Examples: # Use default S3 storage (no env var needed) pytest tests/test_snapshot.py --snapshot-update
# Override with local storage
SNAPSHOT_STORAGE_PATH=tests/__snapshots__ pytest tests/
# Override with different S3 bucket
SNAPSHOT_STORAGE_PATH=s3://my-bucket/snapshots pytest tests/
Functions¶
diff_lines ¶
Generate diff lines for test output.
dirname
classmethod
¶
dirname(*, test_location: PyTestLocation) -> str
Return the directory for storing snapshots.
get_location
classmethod
¶
get_location(*, test_location: PyTestLocation, index: SnapshotIndex = 0) -> str
Get the full snapshot location path.
Override to properly handle S3 paths using upath instead of os.path.join.
get_snapshot_name
classmethod
¶
get_snapshot_name(
*, test_location: PyTestLocation, index: SnapshotIndex = 0
) -> str
Generate snapshot name based on test name.
Sanitizes the test name to replace problematic characters (e.g., brackets from parametrized tests) with underscores for valid file paths.
matches ¶
Check if serialized data matches snapshot using approximate comparison.
Uses assert_allclose instead of assert_equal to handle platform-specific numerical differences from OpenCV and scipy operations between macOS and Linux.
read_snapshot_data_from_location ¶
read_snapshot_data_from_location(
*, snapshot_location: str, snapshot_name: str, session_id: str
) -> Dataset | None
Read zarr snapshot from disk.
serialize ¶
Convert DataArray to Dataset for consistent zarr storage. Returns the data unchanged.
write_snapshot_collection
classmethod
¶
Write snapshot collection to zarr format (local or remote).
Risk analysis¶
Fire risk¶
Core fire/wind risk utilities used by the pipeline (kernels, wind classification, risk composition).
ocr.risks.fire ¶
Functions¶
apply_wind_directional_convolution ¶
apply_wind_directional_convolution(
da: DataArray,
iterations: int = 3,
kernel_size: float = 81.0,
circle_diameter: float = 35.0,
) -> Dataset
Apply a directional convolution to a DataArray.
Parameters:
-
da(DataArray) –The DataArray to apply the convolution to.
-
iterations(int, default:3) –The number of iterations to apply the convolution, by default 3
-
kernel_size(float, default:81.0) –The size of the kernel, by default 81.0
-
circle_diameter(float, default:35.0) –The diameter of the circle, by default 35.0
Returns:
-
ds(Dataset) –The Dataset with the directional convolution applied
calculate_wind_adjusted_risk ¶
Calculate wind-adjusted fire risk using climate run and wildfire risk datasets.
Parameters:
-
x_slice(slice) –Slice object for selecting longitude range.
-
y_slice(slice) –Slice object for selecting latitude range.
-
buffer(float, default:0.15) –Buffer size in degrees to add around the region for edge effect handling (default 0.15). For 30m EPSG:4326 data, 0.15 degrees ≈ 16.7 km ≈ 540 pixels. This buffer ensures neighborhood operations (convolution, Gaussian smoothing) have adequate context at boundaries.
Returns:
-
fire_risk(Dataset) –Dataset containing wind-adjusted fire risk variables.
classify_wind_directions ¶
Classify wind directions into 8 cardinal directions (0-7). The classification is:
0: North (337.5-22.5) 1: Northeast (22.5-67.5) 2: East (67.5-112.5) 3: Southeast (112.5-157.5) 4: South (157.5-202.5) 5: Southwest (202.5-247.5) 6: West (247.5-292.5) 7: Northwest (292.5-337.5)
Parameters:
-
wind_direction_ds(DataArray) –DataArray containing wind direction in degrees (0-360)
Returns:
-
result(DataArray) –DataArray with wind directions classified as integers 0-7
compute_modal_wind_direction ¶
compute_wind_direction_distribution ¶
compute_wind_direction_distribution(
direction: DataArray, fire_weather_mask: DataArray
) -> Dataset
Compute the wind direction distribution during fire weather conditions.
Parameters:
-
direction(DataArray) –Wind direction in degrees (0-360).
-
fire_weather_mask(DataArray) –Boolean mask indicating fire weather conditions.
Returns:
-
wind_direction_hist(Dataset) –Wind direction histogram during fire weather conditions.
create_weighted_composite_bp_map ¶
create_weighted_composite_bp_map(
bp: Dataset,
wind_direction_distribution: DataArray,
*,
distribution_direction_dim: str = 'wind_direction',
weight_sum_tolerance: float = 1e-05,
) -> DataArray
Create a weighted composite burn probability map using wind direction distribution.
Parameters:
-
bp(Dataset) –Dataset containing 9 directional burn probability layers with variables named ['N','NE','E','SE','S','SW','W','NW','circular'] produced by
apply_wind_directional_convolution. -
wind_direction_distribution(DataArray) –Probability distribution over 8 cardinal directions with dimension 'wind_direction' and length 8, matching direction labels: ['N','NE','E','SE','S','SW','W','NW'] (order must align). Values should sum to 1 where fire-weather hours exist; may be all 0 where none exist.
-
distribution_direction_dim(str, default:'wind_direction') –Name of the dimension in
wind_direction_distributionthat holds the direction labels, by default 'wind_direction'. -
weight_sum_tolerance(float, default:1e-05) –Tolerance for deviation from 1.0 in the sum of weights, by default
Returns:
-
weighted(DataArray) –Weighted composite burn probability with same spatial dims as inputs. Name: 'wind_weighted_bp'. Missing (all-zero) distributions yield NaN.
create_wind_informed_burn_probability ¶
create_wind_informed_burn_probability(
wind_direction_distribution_30m_4326: DataArray, riley_270m_5070: Dataset
) -> DataArray
Create wind-informed burn probability dataset by applying directional convolution and creating a weighted composite burn probability map.
Parameters:
-
wind_direction_distribution_30m_4326(DataArray) –Wind direction distribution data at 30m resolution in EPSG:4326 projection.
-
riley_270m_5070(DataArray) –Riley et al. (2011) burn probability data at 270m resolution in EPSG:5070 projection.
Returns:
-
smoothed_final_bp(DataArray) –Smoothed wind-informed burn probability data at 30m resolution in EPSG:4326 projection.
direction_histogram ¶
fosberg_fire_weather_index ¶
Calculate the Fosberg Fire Weather Index (FFWI) based on relative humidity, temperature, and wind speed. taken from wikifire.wsl.ch/tiki-indexb1d5.html?page=Fosberg+fire+weather+index&structure=Fire hurs, T2, sfcWind are arrays
Parameters:
-
hurs(DataArray) –Relative humidity in percentage (0-100).
-
T2(DataArray) –Temperature
-
sfcWind(DataArray) –Wind speed in meters per second.
Returns:
-
DataArray–Fosberg Fire Weather Index (FFWI).
generate_weights ¶
generate_weights(
method: Literal['skewed', 'circular_focal_mean'] = 'skewed',
kernel_size: float = 81.0,
circle_diameter: float = 35.0,
) -> ndarray
Generate a 2D array of weights for a circular kernel.
Parameters:
-
method(str, default:'skewed') –The method to use for generating weights. Options are 'skewed' or 'circular_focal_mean'. 'skewed' generates an elliptical kernel to simulate wind directionality. 'circular_focal_mean' generates a circular kernel, by default 'skewed'
-
kernel_size(float, default:81.0) –The size of the kernel, by default 81.0
-
circle_diameter(float, default:35.0) –The diameter of the circle, by default 35.0
Returns:
-
weights(ndarray) –A 2D array of weights for the circular kernel.
generate_wind_directional_kernels ¶
generate_wind_directional_kernels(
kernel_size: float = 81.0, circle_diameter: float = 35.0
) -> dict[str, ndarray]
Internal pipeline modules¶
Internal API
These modules are used internally by the pipeline and are not intended for direct public consumption. They are documented here for completeness and advanced use cases.
Batch managers¶
Orchestration backends for local and Coiled execution.
ocr.deploy.managers ¶
Classes¶
AbstractBatchManager ¶
CoiledBatchManager ¶
Bases: AbstractBatchManager
Coiled batch manager for managing batch jobs.
Functions¶
submit_job ¶
wait_for_completion ¶
wait_for_completion(exit_on_failure: bool = False)
Wait for all tracked jobs to complete.
Parameters:
-
exit_on_failure(bool, default:False) –If True, raise an Exception immediately when a job failure is detected.
Returns:
-
completed, failed : tuple[set[str], set[str]]–A tuple of (completed_job_ids, failed_job_ids). If
exit_on_failureis True and a failure is encountered the method will raise before returning.
LocalBatchManager ¶
Bases: AbstractBatchManager
Local batch manager for running jobs locally using subprocess.
Functions¶
model_post_init ¶
Initialize the thread pool executor after model creation.
submit_job ¶
wait_for_completion ¶
wait_for_completion(exit_on_failure: bool = False)
Wait for all tracked jobs to complete.
Parameters:
-
exit_on_failure(bool, default:False) –If True, raise an Exception immediately when a job failure is detected.
Returns:
-
completed, failed : tuple[set[str], set[str]]–A tuple of (completed_job_ids, failed_job_ids). If
exit_on_failureis True and a failure is encountered the method will raise before returning.
CLI application¶
Command-line interface exposed as the ocr command. For detailed usage and options, see the tutorials section.
ocr¶
Run OCR deployment pipeline on Coiled
Usage:
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.
Subcommands
- aggregate-region-risk-summary-stats: Generate time-horizon based statistical summaries for county and tract level PMTiles creation
- create-building-pmtiles: Create PMTiles from the consolidated geoparquet file.
- create-pyramid: Create Pyramid
- create-regional-pmtiles: Create PMTiles for regional risk statistics (counties and tracts).
- ingest-data: Ingest and process input datasets
- partition-buildings: Partition buildings geoparquet by state and county FIPS codes.
- process-region: Calculate and write risk for a given region to Icechunk CONUS template.
- run: Run the OCR deployment pipeline. This will process regions, aggregate geoparquet files,
- write-aggregated-region-analysis-files: Write aggregated statistical summaries for each region (county and tract).
ocr aggregate-region-risk-summary-stats¶
Generate time-horizon based statistical summaries for county and tract level PMTiles creation
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: c8g.16xlarge]
--help Show this message and exit.
ocr create-building-pmtiles¶
Create PMTiles from the consolidated geoparquet file.
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: c8g.8xlarge]
--disk-size INTEGER Disk size in GB (Coiled only). \[default:
250]
--help Show this message and exit.
ocr create-pyramid¶
Create Pyramid
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: m8g.16xlarge]
--help Show this message and exit.
ocr create-regional-pmtiles¶
Create PMTiles for regional risk statistics (counties and tracts).
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: c8g.8xlarge]
--disk-size INTEGER Disk size in GB (Coiled only). \[default:
250]
--help Show this message and exit.
ocr ingest-data¶
Ingest and process input datasets
Usage:
Options:
Subcommands
- download: Download raw source data for a dataset.
- list-datasets: List all available datasets that can be ingested.
- process: Process downloaded data and upload to S3/Icechunk.
- run-all: Run the complete pipeline: download, process, and cleanup.
ocr ingest-data download¶
Download raw source data for a dataset.
Usage:
Options:
DATASET Name of the dataset to download \[required]
--dry-run Preview operations without executing
--debug Enable debug logging
--help Show this message and exit.
ocr ingest-data list-datasets¶
List all available datasets that can be ingested.
Usage:
Options:
ocr ingest-data process¶
Process downloaded data and upload to S3/Icechunk.
Usage:
Options:
DATASET Name of the dataset to process \[required]
--dry-run Preview operations without executing
--use-coiled Use Coiled for distributed processing
--software TEXT Software environment to use (required if
--use-coiled is set)
--debug Enable debug logging
--overture-data-type TEXT For overture-maps: which data to process
(buildings, addresses, or both) \[default:
both]
--census-geography-type TEXT For census-tiger: which geography to process
(blocks, tracts, counties, or all) \[default:
all]
--census-subset-states TEXT For census-tiger: subset of states to process
(e.g., California Oregon)
--help Show this message and exit.
ocr ingest-data run-all¶
Run the complete pipeline: download, process, and cleanup.
Usage:
Options:
DATASET Name of the dataset to process \[required]
--dry-run Preview operations without executing
--use-coiled Use Coiled for distributed processing
--debug Enable debug logging
--overture-data-type TEXT For overture-maps: which data to process
(buildings, addresses, or both) \[default:
both]
--census-geography-type TEXT For census-tiger: which geography to process
(blocks, tracts, counties, or all) \[default:
all]
--census-subset-states TEXT For census-tiger: subset of states to process
(e.g., California Oregon)
--help Show this message and exit.
ocr partition-buildings¶
Partition buildings geoparquet by state and county FIPS codes.
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: c8g.12xlarge]
--help Show this message and exit.
ocr process-region¶
Calculate and write risk for a given region to Icechunk CONUS template.
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
REGION_ID Region ID to process, e.g., y10_x2
\[required]
-t, --risk-type [fire] Type of risk to calculate \[default: fire]
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
--init-repo Initialize Icechunk repository (if not
already initialized).
--help Show this message and exit.
ocr run¶
Run the OCR deployment pipeline. This will process regions, aggregate geoparquet files, and create PMTiles layers for the specified risk type.
Usage:
Options:
-e, --env-file PATH Path to the environment variables file.
These will be used to set up the
OCRConfiguration
-r, --region-id TEXT Region IDs to process, e.g., y10_x2
--all-region-ids Process all valid region IDs
-t, --risk-type [fire] Type of risk to calculate \[default: fire]
--write-regional-stats Write aggregated statistical summaries for
each region (one file per region type with
stats like averages, medians, percentiles,
and histograms)
--create-pyramid Create ndpyramid / multiscale zarr for web-
visualization
-p, --platform [coiled|local] Platform to run the pipeline on \[default:
local]
--wipe Wipe the icechunk and vector data storages
before running the pipeline
--dispatch-platform [coiled|local]
If set, schedule this run command on the
specified platform instead of running
inline.
--vm-type TEXT VM type override for dispatch-platform
(Coiled only).
--process-retries INTEGER RANGE
Number of times to retry failed process-
region tasks (Coiled only). 0 disables
retries. \[default: 2; x>=0]
--help Show this message and exit.
ocr write-aggregated-region-analysis-files¶
Write aggregated statistical summaries for each region (county and tract).
Creates one file per region type containing aggregated statistics for ALL regions, including building counts, average/median risk values, percentiles (p90, p95, p99), and histograms. Outputs in geoparquet, geojson, and csv formats.
Usage:
Options:
-e, --env-file PATH Path to the environment variables file. These
will be used to set up the OCRConfiguration
-p, --platform [coiled|local] If set, schedule this command on the
specified platform instead of running inline.
--vm-type TEXT Coiled VM type override (Coiled only).
\[default: r8g.4xlarge]
--help Show this message and exit.