Interactive online version:

Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles#

twCRPS can be used to evaluate ensembles while emphasising extremes or a range of important thresholds. Sometimes CRPS may mask performance around extremes and it may be desirable to use a threshold weighted score such as twCRPS. If you haven’t done so, we recommend working through the CRPS for forecasts expressed as CDFs and CRPS for ensemble forecasts tutorials before working through this tutorial. Note that the crps_cdf function has threshold weighting built into it, but the crps_for_ensemble does not. This tutorial will demonstrate how to calculate a threshold weighted CRPS for ensembles.

Overview and graphical illustration of twCRPS#

We will begin by illustrating how the twCRPS for ensembles is calculated, first by defining the equation and then with a graphical representation. The emperical twCRPS for ensembles is defined as

\[\text{twCRPS}(F_{\text{ens}}, y; v) = \frac{1}{M} \sum_{m=1}^{M} \left| v(x_m) - v(y) \right| - \frac{1}{2M^2} \sum_{m=1}^{M} \sum_{j=1}^{M} \left| v(x_m) - v(x_j) \right|\]

where - \(M\) is the number of ensemble members, - \(y\) is the observation, - \(x_m\) is the forecast of a single ensemble member, - \(x_j\) is also the forecast of a single ensemble member, and - \(v\) is a ‘chaining function’.

The chaining function \(v\) is an anti-derivative of the threshold weight function \(w\), which is a non-negative function that assigns a weight to each threshold value.

For example, if we wanted to assign a threshold weight of 1 to thresholds above threshold \(t\) and a threshold weight of 0 to thresholds below \(t\), our threshold weight function would be \(w(x) = \mathbb{1}{(x > t)}\), where \(\mathbb{1}\) is the indicator function which returns a value of 1 if the condition is true and 0 otherwise. A chaining function would then be \(v(x) = \text{max}(x, t)\).

Let’s now visualise what the weighting and chaining functions if we were to set \(t=25\) for weighting an upper tail. Let’s assume that we are working with rainfall data with mm as the units.

[1]:

import plotly.graph_objects as go
import numpy as np
from plotly.subplots import make_subplots

x = np.linspace(0, 100, 400)
t = 25

# Calculate w(x)
w_x = np.where(x > t, 1, 0)
# Calculate v(x)
v_x = np.maximum(x, t)

fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("Weight function: w(x) = 1(x>t)", "Chaining function: v(x) = max(x, t)"),
    vertical_spacing=0.15,
)
fig.add_trace(go.Scatter(x=x, y=w_x, mode="lines", name="w(x) = 1(x>t)", showlegend=False), row=1, col=1)
fig.add_trace(go.Scatter(x=x, y=v_x, mode="lines", name="v(x) = max(x, t)", showlegend=False), row=2, col=1)

fig.update_layout(
    xaxis_title="x (mm)",
    yaxis_title="w",
    xaxis2_title="x (mm)",
    yaxis2_title="v",
    showlegend=True,
    width=600,
    height=600,
    margin=dict(l=20, r=20, t=40, b=60),
)
fig.show()

Now let’s produce an illustration of what the difference between CRPS and twCRPS would look like.

We will visualise the CDF of an ensemble with 50 ensemble members against an observation of 50mm

[2]:

np.random.seed(100)
# Produce CDF from ensemble
data = np.random.uniform(0, 70, 50)
sorted_data = np.sort(data)
cdf = np.arange(1, len(sorted_data) + 1) / len(sorted_data)

# Round the value closest to 25 to exactly 25 to make plot line up nicely
index_closest_to_25 = np.argmin(np.abs(sorted_data - 25))
sorted_data[index_closest_to_25] = 25

# Tidy up the beginning of the CDF
sorted_data = np.insert(sorted_data, 0, 0)
cdf = np.insert(cdf, 0, 0)

# Create the observation
x = np.linspace(0, 71, 4000)
obs_cdf = np.where(x > 50, 1, 0)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=sorted_data[np.where(sorted_data >= 25)],
        y=cdf[np.where(sorted_data >= 25)],
        mode="lines",
        name="Forecast CDF",
        line_shape="hv",
        showlegend=False,
    )
)
fig.add_trace(
    go.Scatter(
        x=x[np.where(x >= 25)],
        y=obs_cdf[np.where(x >= 25)],
        mode="lines",
        name="Observed CDF",
        fill="tonexty",
        fillcolor="rgba(176,224,230, 0.7)",
        showlegend=False,
    )
)
fig.add_trace(
    go.Scatter(x=sorted_data, y=cdf, mode="lines", name="Forecast CDF", line_shape="hv", line=dict(color="blue"))
)
fig.add_trace(go.Scatter(x=x, y=obs_cdf, mode="lines", name="Observed CDF", line=dict(color="red")))

fig.update_layout(
    title="Graphical illustration of twCRPS",
    yaxis_title="Probability of non-exceedance",
    xaxis_title="x (mm)",
    showlegend=True,
    width=600,
    height=600,
    margin=dict(l=20, r=20, t=40, b=60),
    legend=dict(x=0.01, y=0.99, xanchor="left", yanchor="top"),
)


fig.show()

The shaded area between the two lines equals the twCRPS for \(w(x) = \mathbb{1}{(x > 25)}\) and \(v(x) = \text{max}(x, 25)\). It measures predictive performance for decision thresholds above 25mm.

Things to try next#

The threshold arg in tail_tw_crps_for_ensemble and the lower_threshold and upper_threshold args in interval_tw_crps_for_ensemble can all take xr.DataArrays instead of floats. Try making the threshold vary with climatology.
Try your own chaining function based on your own weights in tw_crps_for_ensemble.
Explore the impact of setting method="fair" to calculate the fair version of the twCRPS.
Try setting tail="lower" in tail_tw_crps_for_ensemble to emphasise lower extremes.
Compare the ECMWF IFS ensemble against Neural GCM model using the unweighted CRPS (i.e., crps_for_ensemble)
Calculate the confidence that one model is better than another based on the twCRPS

Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles

Contents

Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles#

Overview and graphical illustration of twCRPS#

Calculating twCRPS with scores#

Example 1. 850hPa wind speed - focusing on gales or stronger#

Example 2. 850hPa temperatures - focusing on an interval of decision thresholds#

Example 3. Custom weighting function#

Things to try next#

Further reading#