Interactive online version:

Rank Histogram#

A rank histogram is a graphical tool for evaluating the reliability of ensemble forecasts, by plotting the relative frequency of an observation’s rank against the sorted values of an ensemble forecast. If an ensemble of forecasts is probabilistically calibrated, then the observations should be statistically indistinguishable from random draws from among the ensemble members, and consequently the rank histogram should be flat.

The tutorial on the probability integral transform (PIT) demonstrates similar and additional tools for assessing calibration, also drawing on the synthetic case studies of this tutorial.

Rank histograms for discontinuous distributions#

The scores implementation of the rank histogram also correctly handles the case where the underlying distributions are discontinuous. With discontinuous distributions, the observation can often match the value of multiple ensemble members, and its rank is then not unique. In such cases, scores.probability.rank_histogram distributes the relative frequency contribution evenly over all possible ranks in the tie.

A classic example of this situation is when forecasting rainfall at a particular location. During a dry spell, a good forecast will have many ensemble members forecasting 0 millimetres of precipitation, and the corresponding observation will also often be 0 mm.

We replicate this kind of situtation by taking the previous example and clipping the forecasts and observations to 0 whenever they have negative values.

[4]:

fcst_clipped = fcst.clip(min=0)
obs_clipped = obs.clip(min = 0)
rank_hist_clipped = rank_histogram(fcst_clipped, obs_clipped, "ens_member")

fig, axes = plt.subplots(2, 3, sharex=True, sharey=True)

for fcst_label, i in zip(list(rank_hist.data_vars), range(6)):
    row, col = int(i / 3), i % 3
    axes[row, col].bar(rank_hist_clipped['rank'].values, rank_hist_clipped[fcst_label].values, width=1)
    axes[row, col].axhline(y=1/(n_member+1), color='k', linestyle='--')
    axes[row, col].set_title(fcst_label)

plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.2, hspace=0.3)
fig.text(0.5, 0.01, 'rank', ha='center')
fig.text(-0.01, 0.5, 'relative frequency', va='center', rotation='vertical')

fig.show()

../_images/tutorials_Rank_Histogram_6_0.png

As before, we see that the calibrated forecast systems have flat histograms, even though in many forecast cases the rank of the observation is not unique.

Flat rank histograms do not imply that a forecast system is well-calibrated.#

Hamill (2001) showed that a conditionally mis-calibrated forecast system can have flat histograms. That is, while a calibrated system has an expected flat rank histogram, a flat rank histogram does not imply that the forecast system is well-calibrated. The following is inspired by Hamill’s example and also illustrates how weighting works.

[ ]:

# generate observations for three locations, all from the standard normal distribution N(0,1)
obs = xr.DataArray(
    data=norm.rvs(loc=0, scale=1, size=(3, n_time)),
    dims=['location', 'time'],
    coords={'location': ['city', 'hill', 'valley'], 'time': range(n_time)}
)

# forecasts for the city are under-predictions N(-0.5, 1)
fcst_city = xr.DataArray(
    data=norm.rvs(loc=-0.5, scale=1, size=(n_time, n_member, 1)),
    dims=['time', 'ens_member', 'location'],
    coords={'time': range(n_time), 'ens_member': range(n_member), 'location': ['city']}
)
# forecasts for the hill are over-predictions N(0.5, 1))
fcst_hill = xr.DataArray(
    data=norm.rvs(loc=0.5, scale=1, size=(n_time, n_member, 1)),
    dims=['time', 'ens_member', 'location'],
    coords={'time': range(n_time), 'ens_member': range(n_member), 'location': ['hill']}
)
# forecasts for the valley are over-dispersed N(0, 1.3**2)
fcst_valley = xr.DataArray(
    data=norm.rvs(loc=0, scale=1.3, size=(n_time, n_member, 1)),
    dims=['time', 'ens_member', 'location'],
    coords={'time': range(n_time), 'ens_member': range(n_member), 'location': ['valley']}
)
# combine forecasts into one data array
fcst = xr.concat([fcst_city, fcst_hill, fcst_valley], 'location')

# include some weights - we'll start with equal weighting on city, hill and valley forecasts
weights = xr.DataArray(data=[1, 1, 1], dims=['location'], coords={'location': ['city', 'hill', 'valley']})
rank_hist_weighted = rank_histogram(fcst, obs, "ens_member", reduce_dims='all', weights=weights)

# we now plot the rank histogram for all forecasts combined
fig, axes = plt.subplots(1, 1, figsize=(3, 2))
plt.bar(rank_hist_weighted['rank'].values, rank_hist_weighted.values, width=1)
plt.axhline(y=1/(n_member+1), color='k', linestyle='--')
fig.show()

# We get an almost flat histogram

../_images/tutorials_Rank_Histogram_8_0.png

[6]:

# However, we can expose conditional miscalibration by putting a higher weight on some locations and a lower weight on others.

# For example, here is the rank histogram with a high weight on city forecasts:
weights = xr.DataArray(data=[10, 1, 1], dims=['location'], coords={'location': ['city', 'hill', 'valley']})
rank_hist_weighted = rank_histogram(fcst, obs, "ens_member", reduce_dims='all', weights=weights)

fig, axes = plt.subplots(1, 1, figsize=(3, 2))
plt.bar(rank_hist_weighted['rank'].values, rank_hist_weighted.values, width=1)
plt.axhline(y=1/(n_member+1), color='k', linestyle='--')
fig.show()

# the histogram shows a classic under-prediction signature

../_images/tutorials_Rank_Histogram_9_0.png

[7]:

# repeat, but with a high weight on valley forecasts
weights = xr.DataArray(data=[1, 1, 10], dims=['location'], coords={'location': ['city', 'hill', 'valley']})
rank_hist_weighted = rank_histogram(fcst, obs, "ens_member", reduce_dims='all', weights=weights)

fig, axes = plt.subplots(1, 1, figsize=(3, 2))
plt.bar(rank_hist_weighted['rank'].values, rank_hist_weighted.values, width=1)
plt.axhline(y=1/(n_member+1), color='k', linestyle='--')
fig.show()

# the histogram shows a classic over-dispersed signature

../_images/tutorials_Rank_Histogram_10_0.png

Relationship to the Probability Integral Transform (PIT)#

The rank histogram is very closely related to the probabily integral transform (PIT) and PIT histograms.

The scores.probability.Pit implementation of PIT in scores has several advantages over scores.probability.rank_histogram, as it includes:

objective measures of dispersion and bias
an object measure of deviation from ‘calibration’
values for PIT-uniform probability plots, which is another visual tool for diagnosing mis-calibration
forecast cases where there are NaN values among some of the ensemble members by treating such cases as a smaller ensemble (whereas rank_histogram ignores all such cases, since the number of ranks cannot vary by forecast case)

However, one possible downside of the scores.probability.Pit implementation is that it treats an ensemble of forecasts as an empirical cumulative distribution function (ECDF). An ensemble may be well-calibrated, but when interpreted as an ECDF it may not be. The rank histogram avoids this problem.

See the tutorial on PIT for further information.

Things to try next#

Can you start with an over-dispersed forecast at one location and an under-dispersed forecast at another location and, by choosing appropriate weightings, produce a combined ranked histogram that is flat?

References#

Hamill, T. M. (2001). Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129(3), 550-560. https://doi.org/10.1175/1520-0493(2001)129%3C0550:IORHFV%3E2.0.CO;2

Rank Histogram

Contents

Rank Histogram#

Some characteristic shapes of rank histograms#

Rank histograms for discontinuous distributions#

Flat rank histograms do not imply that a forecast system is well-calibrated.#

Relationship to the Probability Integral Transform (PIT)#

Things to try next#

References#