Interactive online version:

Continuous Ranked Probability Score (CRPS) for Ensembles#

General information about the CRPS, and how to use crps_cdf with a forecast expressed as a cumulative distribution function (CDF), is available in the tutorial “Continuous Ranked Probability Score”. We recommend you work through that tutorial before looking at the crps_for_ensembles and this tutorial.

In weather forecasting it is common to run an ensemble of deterministic models. Each ensemble member provides a deterministic forecast. The idea is that the combined ensemble forecasts can be used to estimate a probability density function (PDF), and hence a CDF.

The crps_for_ensemble function interprets ensemble forecast input as a CDF in one of two ways and calculates the CRPS as demonstrated below.

A CDF is a function of the probability of non-exceedence. It is relevant to real-valued parameters such as temperature, rainfall amount, wind speed etc.

[5]:

from scores.probability import crps_for_ensemble
import matplotlib.pyplot as plt

import numpy
import xarray

Using the `crps_for_ensemble` function#

[6]:

# Uncomment the following line and run to read the documentation regarding the crps_for_ensemble function in scores.
# crps_for_ensemble?

For the purpose of this tutorial, we will create a set of 10 equally likely forecasts of temperature which we will compare to a single verifying observation.

[7]:

# In this example, each ensemble member is given a name 0 to 9
ensemble_forecast = xarray.DataArray(
    [[1.2, 2.0, 2.7, 2.9, 3.0, 3.0, 3.1, 3.2, 3.8, 5.0]],
    dims=["time", "ensemble_member"],
    coords={"time": [0], "ensemble_member": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]},
)

# The observation is assumed to be 4.5
obs_array = xarray.DataArray(4.5, dims=["time"], coords={"time": [0]})

[8]:

# The ensemble forecast can be converted to a CDF
# When converted in a naive way we get the empirical CDF illustrated here. The larger the ensemble the smoother and more sensible this will be.
# The plot below also shows the CDF corresponding to the observation. The area squared corresponds to the CRPS.
fcst_thresholds = numpy.linspace(0, 7, 700)
empirical_cdf = xarray.DataArray(
    coords={"temperature": fcst_thresholds},
    data=[0] * 120 + [0.1] * 80 + [0.2] * 70 + [0.3] * 20 + [0.7] * 30 + [0.8] * 60 + [0.9] * 120 + [1] * 200,
)
observed_cdf = numpy.heaviside(fcst_thresholds - 4.5, 1)
plt.figure(figsize=(4, 3))
plt.plot(fcst_thresholds, observed_cdf, label="Observation")
plt.plot(fcst_thresholds, empirical_cdf, label="Forecast")
plt.fill_between(fcst_thresholds, empirical_cdf, observed_cdf, color="gray", label="area for CRPS")
plt.title("Empirical CDF based on Ensemble")
plt.xlabel("Temperature")
plt.ylabel("Probability")
plt.legend(loc="upper left")

[8]:

<matplotlib.legend.Legend at 0x10b68e810>

../_images/tutorials_CRPS_for_Ensembles_8_1.png

[9]:

# Specifying method='ecdf' assumes the empirical CDF
crps_for_ensemble(ensemble_forecast, obs_array, ensemble_member_dim="ensemble_member", method="ecdf").round(3)

[9]:

<xarray.DataArray ()>
array(1.111)

[10]:

# Specifying method='fair' gives an approximated CRPS assuming that the ensemble values can be interpreted as a random sample from an underlying predictive distribution
crps_for_ensemble(ensemble_forecast, obs_array, ensemble_member_dim="ensemble_member", method="fair").round(3)

[10]:

<xarray.DataArray ()>
array(1.056)

Now, let’s calculate the CRPS for an ensemble that produces forecasts for multiple timesteps

[15]:

temp_fcst = numpy.array([1.2, 2.0, 2.7, 2.9, 3.0, 3.0, 3.1, 3.2, 3.8, 5.0])

ensemble_forecast2 = xarray.DataArray(
    data=[temp_fcst, temp_fcst * 2, temp_fcst * 3],
    dims=["time", "ensemble_member"],
    coords={"time": [0, 1, 2], "ensemble_member": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]},
)

obs_array2 = xarray.DataArray([4.5, 8, 12], dims=["time"], coords={"time": [0, 1, 2]})

crps_for_ensemble(ensemble_forecast2, obs_array2, ensemble_member_dim="ensemble_member", preserve_dims="time").round(3)

[15]:

<xarray.DataArray (time: 3)>
array([1.111, 1.422, 2.133])
Coordinates:
  * time     (time) int64 0 1 2

Things to try next#

Use crps_for_ensemble with some real forecasts with extra dimensions in space and/or time.
Read about the ‘fair’ method in C. Ferro (2014), “Fair scores for ensemble forecasts”, Q J R Meteorol Soc 140(683):1917-1923. https://doi.org/10.1002/qj.2270
Explore our twCRPS for ensemble forecasts tutorial

Continuous Ranked Probability Score (CRPS) for Ensembles

Contents

Continuous Ranked Probability Score (CRPS) for Ensembles#

Using the crps_for_ensemble function#

Things to try next#

Using the `crps_for_ensemble` function#