Comparing raster data for regions of interest¶

In this notebook we compare our 30m resolution rasters of risk with those from the Wildfire Risk to Communities (WRC) project. We visualize regions where the two datasets strongly differ, as well as regions of historical significance (e.g. previous urban conflagrations). These comparisons showcase examples of why the datasets differ and explain patterns of how those differences propagate into CONUS-wide variations.

In [2]:

Copied!





import geopandas as gpd
import icechunk
import matplotlib.pyplot as plt
import xarray as xr
from shapely import Point
import geopandas as gpd
import icechunk
import matplotlib.pyplot as plt
import xarray as xr
from shapely import Point

Load in the raster level data.

In [3]:

Copied!





version = '0.12.0'
storage = icechunk.s3_storage(
    bucket='carbonplan-ocr',
    prefix=f'output/fire-risk/tensor/production/v{version}/ocr.icechunk',
    from_env=True,
)
repo = icechunk.Repository.open(storage)
session = repo.readonly_session('main')

ds = xr.open_dataset(session.store, engine='zarr', chunks={})
ds
version = '0.12.0'
storage = icechunk.s3_storage(
    bucket='carbonplan-ocr',
    prefix=f'output/fire-risk/tensor/production/v{version}/ocr.icechunk',
    from_env=True,
)
repo = icechunk.Repository.open(storage)
session = repo.readonly_session('main')

ds = xr.open_dataset(session.store, engine='zarr', chunks={})
ds

Out[3]:

<xarray.Dataset> Size: 652GB
Dimensions:                     (latitude: 97579, longitude: 208881)
Coordinates:
  * latitude                    (latitude) float64 781kB 22.43 22.43 ... 52.48
  * longitude                   (longitude) float64 2MB -128.4 -128.4 ... -64.05
Data variables:
    USFS_RPS                    (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    burn_probability_2011       (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    burn_probability_2047       (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    burn_probability_usfs_2011  (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    burn_probability_usfs_2047  (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    wind_risk_2047              (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    conditional_risk_usfs       (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>
    wind_risk_2011              (latitude, longitude) float32 82GB dask.array<chunksize=(6000, 4500), meta=np.ndarray>

In [ ]:

Copied!

tracts_dict = {}
for statistic in ['corr_low', 'low_bias', 'high_bias']:
    tracts_dict[statistic] = gpd.read_file(f'{statistic}_tracts_{version}.shp', index_col=0)
tracts_dict = {}
for statistic in ['corr_low', 'low_bias', 'high_bias']:
    tracts_dict[statistic] = gpd.read_file(f'{statistic}_tracts_{version}.shp', index_col=0)

In [4]:

Copied!

states = gpd.read_file('s3://carbonplan-risks/shapefiles/cb_2018_us_state_20m.zip')
states = states[~(states['STUSPS'].isin(['AK', 'PR', 'HI']))]
states = gpd.read_file('s3://carbonplan-risks/shapefiles/cb_2018_us_state_20m.zip')
states = states[~(states['STUSPS'].isin(['AK', 'PR', 'HI']))]

In [5]:

Copied!

var_name_dict = {'corr_low': 'low correlation', 'high_bias': 'high bias', 'low_bias': 'low bias'}
var_name_dict = {'corr_low': 'low correlation', 'high_bias': 'high bias', 'low_bias': 'low bias'}

In [6]:

Copied!





def plot_regions_two_datasets(
    ds,
    bounding_box,
    title,
    ds1_title='CarbonPlan',
    ds2_title='Scott (2024)',
    points_to_plot=None,
    vlims=None,
):
    fig, axarr = plt.subplots(ncols=3, figsize=(15, 5))
    if vlims is not None:
        ax = ds.wind_risk_2011.plot(ax=axarr[0], vmin=vlims[0], vmax=vlims[1])
    else:
        ax = ds.wind_risk_2011.plot(ax=axarr[0])
    axarr[0].set_title(ds1_title)
    ds.USFS_RPS.plot(ax=axarr[1], vmin=ax.get_clim()[0], vmax=ax.get_clim()[1])
    axarr[1].set_title(ds2_title)
    states.plot(ax=axarr[2], color='white', edgecolor='black')
    if points_to_plot is None:
        gpd.GeoSeries(Point(bounding_box[1].start, bounding_box[0].start)).set_crs(
            'EPSG:4326'
        ).centroid.plot(ax=axarr[2])
    else:
        points_to_plot.centroid.plot(ax=axarr[2])
    plt.suptitle(title)
    plt.tight_layout()
def plot_regions_two_datasets(
    ds,
    bounding_box,
    title,
    ds1_title='CarbonPlan',
    ds2_title='Scott (2024)',
    points_to_plot=None,
    vlims=None,
):
    fig, axarr = plt.subplots(ncols=3, figsize=(15, 5))
    if vlims is not None:
        ax = ds.wind_risk_2011.plot(ax=axarr[0], vmin=vlims[0], vmax=vlims[1])
    else:
        ax = ds.wind_risk_2011.plot(ax=axarr[0])
    axarr[0].set_title(ds1_title)
    ds.USFS_RPS.plot(ax=axarr[1], vmin=ax.get_clim()[0], vmax=ax.get_clim()[1])
    axarr[1].set_title(ds2_title)
    states.plot(ax=axarr[2], color='white', edgecolor='black')
    if points_to_plot is None:
        gpd.GeoSeries(Point(bounding_box[1].start, bounding_box[0].start)).set_crs(
            'EPSG:4326'
        ).centroid.plot(ax=axarr[2])
    else:
        points_to_plot.centroid.plot(ax=axarr[2])
    plt.suptitle(title)
    plt.tight_layout()

Areas where datasets are poorly correlated¶

By plotting the census tracts with the lowest correlations we see the two main causes of low agreement:

The influence of wind, in which we preferentially spread risk in some places while WRC uniformly spreads it. We'll call this "wind effect". This can either increase or decrease the risk values depending on whether the wind is smearing the burn probability preferentially away from or toward developed areas.
The WRC project's underlying burn probability maps decrease burn probability in many developed areas. This is visible in maps where Scott has more detailed spatial structure in contrast to our blurrier maps. We'll call this the "development effect". This only contributes to a high bias in our data with respect to Scott (2024).

In [7]:

Copied!





for statistic in ['corr_low']:
    for index in tracts_dict['corr_low'].index:
        bounds = tracts_dict['corr_low'].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict['corr_low'].loc[[index]].centroid,
        )
for statistic in ['corr_low']:
    for index in tracts_dict['corr_low'].index:
        bounds = tracts_dict['corr_low'].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict['corr_low'].loc[[index]].centroid,
        )

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 for statistic in ['corr_low']:
----> 2     for index in tracts_dict['corr_low'].index:
      3         bounds = tracts_dict['corr_low'].loc[[index]].bounds
      4         bounding_box = [
      5             slice(bounds.loc[index].maxy, bounds.loc[index].miny),
      6             slice(bounds.loc[index].minx, bounds.loc[index].maxx),
      7         ]

NameError: name 'tracts_dict' is not defined

Areas where we are high-biased with respect to Scott et al. (2024)¶

As with the plots of areas of low correlation above, we see strong high biases in places where both the wind effect and the development effect are at play. The third row below shows an example of where spatial artifacts from the underlying conditional risk to potential structures (CRPS). This is because our higher burn probability (elevated due to the wind effect), makes them visible in our data while they are washed out in the Scott et al. (2024) data.

In [9]:

Copied!





for statistic in ['high_bias']:
    for index in tracts_dict[statistic].index:
        bounds = tracts_dict[statistic].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict[statistic].loc[[index]].centroid,
        )
for statistic in ['high_bias']:
    for index in tracts_dict[statistic].index:
        bounds = tracts_dict[statistic].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict[statistic].loc[[index]].centroid,
        )

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 for statistic in ['high_bias']:
----> 2     for index in tracts_dict[statistic].index:
      3         bounds = tracts_dict[statistic].loc[[index]].bounds
      4         bounding_box = [
      5             slice(bounds.loc[index].maxy, bounds.loc[index].miny),
      6             slice(bounds.loc[index].minx, bounds.loc[index].maxx),
      7         ]

NameError: name 'tracts_dict' is not defined

Areas where we are low-biased with respect to Scott et al. (2024)¶

The develoment effect does not contribute to places where we are low-biased with respect to the Scott et al. (2024) data. Low-biased areas are places where our risk maps depict lower risk values because we preferentially spread burn probability away from developed areas.

In [10]:

Copied!





for statistic in ['low_bias']:
    for index in tracts_dict[statistic].index:
        bounds = tracts_dict[statistic].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict[statistic].loc[[index]].centroid,
        )
for statistic in ['low_bias']:
    for index in tracts_dict[statistic].index:
        bounds = tracts_dict[statistic].loc[[index]].bounds
        bounding_box = [
            slice(bounds.loc[index].maxy, bounds.loc[index].miny),
            slice(bounds.loc[index].minx, bounds.loc[index].maxx),
        ]
        plot_regions_two_datasets(
            ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
            bounding_box,
            f'Area where CarbonPlan data has {var_name_dict[statistic]} w.r.t. Scott (2024)',
            tracts_dict[statistic].loc[[index]].centroid,
        )

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 for statistic in ['low_bias']:
----> 2     for index in tracts_dict[statistic].index:
      3         bounds = tracts_dict[statistic].loc[[index]].bounds
      4         bounding_box = [
      5             slice(bounds.loc[index].maxy, bounds.loc[index].miny),
      6             slice(bounds.loc[index].minx, bounds.loc[index].maxx),
      7         ]

NameError: name 'tracts_dict' is not defined

In [13]:

Copied!

bounding_box
bounding_box

Out[13]:

[slice(34.22, 34.15, None), slice(-118.18, -118.09, None)]

In [12]:

Copied!

ds.sel(latitude=bounding_box[0], longitude=bounding_box[1])
ds.sel(latitude=bounding_box[0], longitude=bounding_box[1])

Out[12]:

<xarray.Dataset> Size: 2kB
Dimensions:                     (latitude: 0, longitude: 292)
Coordinates:
  * latitude                    (latitude) float64 0B 
  * longitude                   (longitude) float64 2kB -118.2 -118.2 ... -118.1
Data variables:
    USFS_RPS                    (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    burn_probability_2011       (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    burn_probability_2047       (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    burn_probability_usfs_2011  (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    burn_probability_usfs_2047  (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    wind_risk_2047              (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    conditional_risk_usfs       (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>
    wind_risk_2011              (latitude, longitude) float32 0B dask.array<chunksize=(0, 292), meta=np.ndarray>

Areas with historic fires¶

We look at three regions with historical urban conflagrations and we show how we have spread risk preferentially toward developed areas in those regions.

In [15]:

Copied!





bounding_box = [slice(34.15, 34.22), slice(-118.18, -118.09)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Altadena, CA (Eaton Fire)',
    vlims=[0, 2.5],
)
bounding_box = [slice(34.15, 34.22), slice(-118.18, -118.09)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Altadena, CA (Eaton Fire)',
    vlims=[0, 2.5],
)

/tmp/ipykernel_299/3748588936.py:22: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  ).centroid.plot(ax=axarr[2])

No description has been provided for this image

In [16]:

Copied!





bounding_box = [slice(39.9, 40), slice(-105.2, -105.1)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Superior, CO (Marshall Fire)',
    # vlims=[0,2.5]
)
bounding_box = [slice(39.9, 40), slice(-105.2, -105.1)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Superior, CO (Marshall Fire)',
    # vlims=[0,2.5]
)

/tmp/ipykernel_299/3748588936.py:22: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  ).centroid.plot(ax=axarr[2])

In [17]:

Copied!





bounding_box = [slice(39.7, 39.85), slice(-121.65, -121.52)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Paradise, CA (Camp Fire)',
    # vlims=[0,2.5]
)
bounding_box = [slice(39.7, 39.85), slice(-121.65, -121.52)]
plot_regions_two_datasets(
    ds=ds.sel(latitude=bounding_box[0], longitude=bounding_box[1]),
    bounding_box=bounding_box,
    title='Paradise, CA (Camp Fire)',
    # vlims=[0,2.5]
)

/tmp/ipykernel_299/3748588936.py:22: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  ).centroid.plot(ax=axarr[2])

In [ ]: