Brier score#
The Brier score is the most commonly used verification metric for evaluating a probability of a binary outcome forecast, such as a “chance of rainfall” forecast.
Probabilistic forecasts of binary events are expressed as values between 0 and 1, and observations are exactly 0 (event did not occur), or 1 (event occured).
The metric is then calculated the same way as MSE. The Brier score is a strictly proper scoring rule where lower values are better (it is negatively oriented) where a perfect score is 0 and the worst score is 1.
[1]:
from scores.probability import brier_score
from scipy.stats import beta, binom
import numpy as np
import xarray as xr
[2]:
# To learn more about the implemenation of the Brier score, uncomment the following
# help(brier_score)
We generate two synthetic forecasts. By design, fcst1 is a good forecast, while fcst2 is a poor forecast. We measure the difference in skill by calculating and comparing their Brier Scores.
[3]:
fcst1 = beta.rvs(2, 1, size=1000)
obs = binom.rvs(1, fcst1)
fcst2 = beta.rvs(0.5, 1, size=1000)
fcst1 = xr.DataArray(data=fcst1, dims="time", coords={"time": np.arange(0, 1000)})
fcst2 = xr.DataArray(data=fcst2, dims="time", coords={"time": np.arange(0, 1000)})
obs = xr.DataArray(data=obs, dims="time", coords={"time": np.arange(0, 1000)})
[4]:
brier_fcst1 = brier_score(fcst1, obs)
brier_fcst2 = brier_score(fcst2, obs)
print(f"Brier score for fcst1 = {brier_fcst1.item():.2f}")
print(f"Brier score for fcst2 = {brier_fcst2.item():.2f}")
Brier score for fcst1 = 0.16
Brier score for fcst2 = 0.43
As expected, fcst1 has the lower Brier Score quantifying the degree to which it is better than fcst2.
Notes#
If you are using the Brier score on large data with Dask, consider setting
check_argsarg toFalseinbrier_score.In the future, the Brier score components calculation will be added.
You may be interested in working through the Murphy Diagram tutorial which allows you to break down the performance of the Brier score based on each threshold probability.