API Documentation#
scores.continuous#
Import the functions from the implementations into the public API
- scores.continuous.additive_bias(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the additive bias which is also sometimes called the mean error.
It is defined as
\[\text{Additive bias} =\frac{1}{N}\sum_{i=1}^{N}(x_i - y_i) \text{where } x = \text{the forecast, and } y = \text{the observation}\]See “Mean error” section at https://jwgfvr.github.io/forecastverification/index.html#meanerror for more information.
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the additive bias. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the additive bias. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See thescoresweighting tutorial for more information on how to use weights.
- Returns:
An xarray object with the additive bias of a forecast.
- Return type:
DataArray | Dataset
References
- scores.continuous.mean_error(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the mean error which is also sometimes called the additive bias.
It is defined as
\[\text{mean error} =\frac{1}{N}\sum_{i=1}^{N}(x_i - y_i) \text{where } x = \text{the forecast, and } y = \text{the observation}\]See “Mean error” section at https://jwgfvr.github.io/forecastverification/index.html#meanerror for more information.
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the mean error. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the mean error. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See thescoresweighting tutorial for more information on how to use weights.
- Returns:
An xarray object with the mean error of a forecast.
- Return type:
DataArray | Dataset
References
- scores.continuous.mae(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, is_angular=False)#
Calculates the mean absolute error from forecast and observed data.
A detailed explanation is on https://en.wikipedia.org/wiki/Mean_absolute_error
\[\frac{1}{n} \sum_{i=1}^n | \text{forecast}_i - \text{observed}_i |\]- Parameters:
fcst (DataArray | Dataset | Series) – Forecast or predicted variables in xarray or pandas.
obs (DataArray | Dataset | Series) – Observed variables in xarray or pandas.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating MAE. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating MAE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.is_angular (bool) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
By default an xarray DataArray containing a single floating point number representing the mean absolute error for the supplied data. All dimensions will be reduced.
Alternatively, an xarray structure with dimensions preserved as appropriate containing the score along reduced dimensions
- Raises:
ValueError – If fcst and obs are not xarray objects and weights is not None.
- Return type:
DataArray | Dataset | Series
- scores.continuous.mse(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, is_angular=False)#
Calculates the mean squared error from forecast and observed data.
See “Mean squared error” section at https://jwgfvr.github.io/forecastverification/index.html#MSE for more information.
\[\frac{1}{n} \sum_{i=1}^n (\text{forecast}_i - \text{observed}_i)^2\]- Parameters:
fcst (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Forecast or predicted variables in xarray or pandas.
obs (Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]) – Observed variables in xarray or pandas.
reduce_dims (Union[str, Iterable[str]) – Optionally specify which dimensions to reduce when calculating MSE. All other dimensions will be preserved.
preserve_dims (Union[str, Iterable[str]) – Optionally specify which dimensions to preserve when calculating MSE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the squared error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See thescoresweighting tutorial for more information on how to use weights.is_angular (bool | None) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
- An object containing
a single floating point number representing the mean absolute error for the supplied data. All dimensions will be reduced. Otherwise: Returns an object representing the mean squared error, reduced along the relevant dimensions and weighted appropriately.
- Return type:
Union[xr.Dataset, xr.DataArray, pd.Dataframe, pd.Series]
- Raises:
ValueError – If fcst and obs are not xarray objects and weights is not None.
References
- scores.continuous.quantile_score(fcst, obs, alpha, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates a score that targets alpha-quantiles. Use with alpha = 0.5 for forecasts of the median. Use with alpha = 0.9 for forecasts of the 90th percentile.
- Parameters:
fcst (DataArray | Dataset) – array of forecasts
obs (DataArray | Dataset) – array of observations
alpha (float) – A value between 0 and 1 (exclusive)
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the quantile score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating quantile score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the quantile score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
A DataArray with values being the mean generalised piecewise linear (GPL) scoring function, with the dimensions specified in dims. If dims is None, the returned DataArray will have only one element, the overall mean GPL score.
- Raises:
ValueError – if alpha is not between 0 and 1.
- Return type:
DataArray | Dataset
Notes
\[\begin{split} gpl(x) = \begin{cases}\alpha * (-x) & x \leq 0\\ (1-\alpha) x & x > 0\end{cases}\end{split}\]- where:
\(\alpha\) is the targeted quantile.
\(x\) is the difference, fcst - obs
References
T. Gneiting, “Making and evaluating point forecasts”, J. Amer. Stat. Assoc., Vol. 106 No. 494 (June 2011), pp. 754–755, Theorem 9
- scores.continuous.rmse(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, is_angular=False)#
Calculate the Root Mean Squared Error
A detailed explanation is on https://en.wikipedia.org/wiki/Root-mean-square_deviation
\[\sqrt{\frac{1}{n} \sum_{i=1}^n (\text{forecast}_i - \text{observed}_i)^2}\]- Parameters:
fcst (DataArray | Dataset | Series) – Forecast or predicted variables in xarray or pandas.
obs (DataArray | Dataset | Series) – Observed variables in xarray or pandas.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating RMSE. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating RMSE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.is_angular (bool) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
- An object containing
a single floating point number representing the root mean squared error for the supplied data. All dimensions will be reduced. Otherwise: Returns an object representing the root mean squared error, reduced along the relevant dimensions and weighted appropriately.
- Raises:
ValueError – If fcst and obs are not xarray objects and weights is not None.
- Return type:
DataArray | Dataset | Series
- scores.continuous.murphy_score(fcst, obs, thetas, *, functional, alpha, huber_a=None, decomposition=False, reduce_dims=None, preserve_dims=None)#
Returns the mean elementary score (Ehm et. al. 2016), also known as Murphy score, evaluated at decision thresholds specified by thetas. Optionally returns a decomposition of the score in terms of penalties for over- and under-forecasting.
Select
functional="quantile"andalpha=0.5for the median functional. Selectfunctional="expectile"andalpha=0.5for the mean (i.e., expectation) functional.Consider using
murphy_thetas()to generate thetas. If utilising dask, you may want to storethetasas a netCDF on disk and pass it in as an xarray object. Otherwise, very large objects may be created whenfcst,obsandthetasare broadcast together.- Parameters:
fcst (DataArray) – Forecast numerical values.
obs (DataArray) – Observed numerical values.
thetas (Sequence[float] | DataArray) – Theta thresholds.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
alpha (float) – Risk parameter (i.e. quantile or expectile level) for the functional. Must be between 0 and 1.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
decomposition (bool) – True to return penalty values due to under- and over-fcst as well as the total score, False to return total score only.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Murphy score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Murphy score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the FIRM score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.
- Returns:
An xr.Dataset with dimensions based on the
preserve_dimsorreduce_dimsarg as well as a “theta” dimension with values matchingthetasinput.If
decompositionis False, the dataset’s variables will contain 1 element “total”.If
decompositionis True, in addition to “total”, it will have “underforecast” and “overforecast” data_vars.- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
alphais not strictly between 0 and 1.ValueError – If
huber_ais not strictly greater than 0.
- Return type:
Dataset
References
- For mean elementary score definitions, see:
Theorem 1 of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Theorem 5.3 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.continuous.murphy_thetas(forecasts, obs, functional, *, huber_a=None, left_limit_delta=None)#
Return the decision thresholds (theta values) at which to evaluate elementary scores for the construction of Murphy diagrams.
This function may generate a very large number of theta thresholds if the forecast and obs data arrays are large and their values have high precision, which may lead to long computational times for Murphy diagrams. To reduce the number of thetas, users can first round forecast and obs values to an appropriate resolution.
- Parameters:
forecasts (list[DataArray]) – Forecast values, one array per source.
obs (DataArray) – Observed values.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
left_limit_delta (float | None) – Approximation of the left hand limit, used for “huber” and “expectile” functionals. Must be greater than or equal to 0. None will be treated as 0. Ideally, left_limit_delta should be small relative to the fcst and obs precision, and not greater than that precision.
- Returns:
theta thresholds to be used to compute murphy scores.
- Return type:
List[float]
- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
huber_ais not strictly greater than 0.ValueError – If
left_limit_deltais not greater than or equal to 0.
Notes
For theta values at which to evaluate elementary scores, see
Corollary 2 (p.521) of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Corollary 5.6 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.continuous.flip_flop_index(data: DataArray, sampling_dim: str, *, is_angular: bool = False, **selections: Iterable[int]) Dataset#
- scores.continuous.flip_flop_index(data: DataArray, sampling_dim: str, *, is_angular: bool = False, **selections: None) DataArray
Calculates the Flip-Flop Index along the dimensions sampling_dim.
- Parameters:
data – Data from which to draw subsets.
sampling_dim – The name of the dimension along which to calculate the Flip-Flop Index.
is_angular – specifies whether data is directional data (e.g. wind direction).
**selections – Additional keyword arguments specify subsets to draw from the dimension sampling_dim of the supplied data before calculation of the Flip_Flop Index. e.g. days123=[1, 2, 3]
- Returns:
An xarray.DataArray, the Flip-Flop Index by collapsing the dimension sampling_dim.
If selections are supplied: An xarray.Dataset. Each data variable is a supplied key-word argument, and corresponds to selecting the values specified from sampling_dim of data. The Flip-Flop Index is calculated for each of these selections.
- Return type:
If selections are not supplied
Notes
\[\text{{Flip-Flop Index}} = \frac{{1}}{{N-2}} \left [ \left(\sum\limits_{{i=1}}^{{N-1}}|x_i - x_{{i+1}}|\right) - \left(\max_{{j}}\{{x_j\}} - \min_{{j}}\{{x_j\}}\right) \right ]\]Where \(N\) is the number of data points, and \(x_i\) is the \(i^{{\text{{th}}}}\) data point.
Examples
>>> data = xr.DataArray([50, 20, 40, 80], coords={{'lead_day': [1, 2, 3, 4]}})
>>> flip_flop_index(data, 'lead_day') <xarray.DataArray ()> array(15.0) Attributes: sampling_dim: lead_day
>>> flip_flop_index(data, 'lead_day', days123=[1, 2, 3], all_days=[1, 2, 3, 4]) <xarray.Dataset> Dimensions: () Coordinates: *empty* Data variables: days123 float64 20.0 all_days float64 15.0 Attributes: selections: {{'days123': [1, 2, 3], 'all_days': [1, 2, 3, 4]}} sampling_dim: lead_day
- scores.continuous.flip_flop_index_proportion_exceeding(data, sampling_dim, thresholds, *, is_angular=False, preserve_dims=None, reduce_dims=None, **selections)#
Calculates the Flip-Flop Index and returns the proportion exceeding (or equal to) each of the supplied thresholds.
- Parameters:
data (DataArray) – Data from which to draw subsets.
sampling_dim (str) – The name of the dimension along which to calculate
thresholds (Iterable) – The proportion of Flip-Flop Index results equal to or exceeding these thresholds will be calculated. the Flip-Flop Index.
is_angular (bool) – specifies whether data is directional data (e.g. wind direction).
reduce_dims (Iterable[Hashable] | None) – Dimensions to reduce.
preserve_dims (Iterable[Hashable] | None) – Dimensions to preserve.
**selections (Iterable[int]) – Additional keyword arguments specify subsets to draw from the dimension sampling_dim of the supplied data before calculation of the Flip_Flop Index. e.g. days123=[1, 2, 3]
- Returns:
If selections are not supplied - An xarray.DataArray with dimensions dims + ‘threshold’. The DataArray is the proportion of the Flip-Flop Index calculated by collapsing dimension sampling_dim exceeding or equal to thresholds.
If selections are supplied - An xarray.Dataset with dimensions dims + ‘threshold’. There is a data variable for each keyword in selections, and corresponds to the Flip-Flop Index proportion exceeding for the subset of data specified by the keyword values.
Examples
>>> data = xr.DataArray( ... [[50, 20, 40, 80], [10, 50, 10, 100], [0, 30, 20, 50]], ... dims=['station_number', 'lead_day'], ... coords=[[10001, 10002, 10003], [1, 2, 3, 4]] ... )
>>> flip_flop_index_proportion_exceeding(data, 'lead_day', [20]) <xarray.DataArray (threshold: 1)> array([ 0.33333333]) Coordinates: * threshold (threshold) int64 20 Attributes: sampling_dim: lead_day
>>> flip_flop_index_proportion_exceeding( ... data, 'lead_day', [20], days123=[1, 2, 3], all_days=[1, 2, 3, 4] ... ) <xarray.Dataset> Dimensions: (threshold: 1) Coordinates: * threshold (threshold) int64 20 Data variables: days123 (threshold) float64 0.6667 all_days (threshold) float64 0.3333 Attributes: selections: {{'days123': [1, 2, 3], 'all_days': [1, 2, 3, 4]}} sampling_dim: lead_day
See also
- scores.continuous.correlation.pearsonr(fcst, obs, *, reduce_dims=None, preserve_dims=None)#
Calculates the Pearson’s correlation coefficient between two xarray DataArrays
\[\rho = \frac{\sum_{i=1}^{n}{(x_i - \bar{x})(y_i - \bar{y})}} {\sqrt{\sum_{i=1}^{n}{(x_i-\bar{x})^2}\sum_{i=1}^{n}{(y_i - \bar{y})^2}}}\]- where:
\(\rho\) = Pearson’s correlation coefficient
\(x_i\) = the values of x in a sample (i.e. forecast values)
\(\bar{x}\) = the mean value of the forecast sample
\(y_i\) = the values of y in a sample (i.e. observed values)
\(\bar{y}\) = the mean value of the observed sample value
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Pearson’s correlation coefficient. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Pearson’s correlation coefficient. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
- Returns:
An xarray object with Pearson’s correlation coefficient values
- Raises:
ValueError – If a user tries to preserve all dimensions a ValueError will be raised.
TypeError – If the input types are not xarray DataArrays or Datasets.
ValueError – If the input Datasets do not have the same data variables.
- Return type:
DataArray | Dataset
Note
This function isn’t set up to take weights.
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.continuous.correlation.correlation_impl import pearsonr >>> # Create example forecast and observation DataArrays >>> fcst = xr.DataArray(np.random.rand(10, 5), dims=("time", "location")) >>> obs = xr.DataArray(np.random.rand(10, 5), dims=("time", "location")) >>> # Calculate Pearson's correlation coefficient >>> result = pearsonr(fcst, obs, reduce_dims="time") >>> print(result)
- scores.continuous.correlation.spearmanr(fcst, obs, *, reduce_dims=None, preserve_dims=None)#
Calculates the Spearman’s rank correlation coefficient between two xarray objects.
\[r_s = \rho\big(R(x), R(y)\big)\]- where:
\(\rho\) is the Pearson correlation coefficient.
\(R\) is the ranking operator.
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Spearman’s rank correlation coefficient. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Spearman’s rank correlation coefficient. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved.
- Returns:
An xarray object with Spearman’s rank correlation coefficient values.
- Raises:
ValueError – If a user tries to preserve all dimensions ValueError will be raised.
TypeError – If the input types are not xarray DataArrays or Datasets.
ValueError – If the input Datasets do not have the same data variables.
- Return type:
DataArray | Dataset
Note
This function isn’t set up to take weights.
- Reference:
Spearman, C. (1904). The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15(1), 72–101. https://doi.org/10.2307/1412159
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.continuous.correlation.correlation_impl import spearmanr >>> # Create example forecast and observation DataArrays >>> fcst = xr.DataArray(np.random.rand(10, 5), dims=("time", "location")) >>> obs = xr.DataArray(np.random.rand(10, 5), dims=("time", "location")) >>> # Calculate Spearman's rank correlation coefficient >>> result = spearmanr(fcst, obs, reduce_dims="time") >>> print(result)
- scores.continuous.multiplicative_bias(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the multiplicative bias.
Most suited for forecasts that have a lower bound at 0 such as wind speed. Will return a np.inf where the mean of obs across the dims to be reduced is 0. It is defined as
\[\text{{Multiplicative bias}} = \frac{\frac{1}{N}\sum_{i=1}^{N}x_i}{\frac{1}{N}\sum_{i=1}^{N}y_i} \text{where } x = \text{the forecast, and } y = \text{the observation}\]See “(Multiplicative) bias” section at https://jwgfvr.github.io/forecastverification/index.html#multiplicative_bias for more information.
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the multiplicative bias. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the multiplicative bias. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See thescoresweighting tutorial for more information on how to use weights.
- Returns:
An xarray object with the multiplicative bias of a forecast.
- Return type:
DataArray | Dataset
References
- scores.continuous.pbias(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the percent bias, which is the ratio of the additive bias to the mean observed value, multiplied by 100.
Percent bias is used for evaluating and comparing forecast accuracy across stations or datasets with varying magnitudes. By expressing the error as a percentage of the observed value, it allows for standardised comparisons, enabling assessment of forecast performance regardless of the absolute scale of values. Like
scores.continuous.multiplicative_bias(),pbiaswill return anp.infwhere the mean ofobsacross the dims to be reduced is 0. It is defined as\[\text{Percent bias} = 100 \cdot \frac{\sum_{i=1}^{N}(x_i - y_i)}{\sum_{i=1}^{N} y_i}\]- where:
\(x_i\) = the values of x in a sample (i.e. forecast values)
\(y_i\) = the values of y in a sample (i.e. observed values)
See “pbias” section at https://search.r-project.org/CRAN/refmans/hydroGOF/html/pbias.html for more information
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the percent bias. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the percent bias. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, NaN values in weights can be replaced by
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
An xarray object with the percent bias of a forecast.
- Return type:
DataArray | Dataset
References
Sorooshian, S., Duan, Q., & Gupta, V. K. (1993). Calibration of rainfall-runoff models: Application of global optimization to the Sacramento Soil Moisture Accounting Model. Water Resources Research, 29(4), 1185-1194. https://doi.org/10.1029/92WR02617
Alfieri, L., Pappenberger, F., Wetterhall, F., Haiden, T., Richardson, D., & Salamon, P. (2014). Evaluation of ensemble streamflow predictions in Europe. Journal of Hydrology, 517, 913-922. https://doi.org/10.1016/j.jhydrol.2014.06.035
Dawson, C. W., Abrahart, R. J., & See, L. M. (2007). HydroTest: A web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environmental Modelling and Software, 22(7), 1034-1052. https://doi.org/10.1016/j.envsoft.2006.06.008
Moriasi, D. N., Arnold, J. G., Van Liew, M. W., Bingner, R. L., Harmel, R. D., & Veith, T. L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE, 50(3), 885-900. https://doi.org/10.13031/2013.23153
- scores.continuous.percent_within_x(fcst, obs, threshold, *, reduce_dims=None, preserve_dims=None, is_angular=False, decimals=None, is_inclusive=True)#
Computes the proportion of forecasts within a specified absolute tolerance of the observations.
This score calculates the percentage of forecast values that are within a specified threshold (plus an optional tolerance) of the observed values. This metric is particularly useful when evaluating how often a forecast falls within an acceptable error band, regardless of direction.
The general formulations are:
\[\text{Percent within X (exclusive)} = 100 \cdot \frac{\sum_{i=1}^{N} \mathbf{1}\left(|x_i - y_i| < \tau\right)} {\sum_{i=1}^{N} \mathbf{1}_{\text{valid}}}\]\[\text{Percent within X (inclusive)} = 100 \cdot \frac{\sum_{i=1}^{N} \mathbf{1}\left(|x_i - y_i| \leq \tau\right)} {\sum_{i=1}^{N} \mathbf{1}_{\text{valid}}}\]- where:
\(x_i\) is the forecast value at index \(i\)
\(y_i\) is the observed value at index \(i\)
\(\tau\) is the absolute error threshold
\(\mathbf{1}(\cdot)\) is the indicator function
\(\mathbf{1}_{\text{valid}}\) is 1 where both \(x_i\) and \(y_i\) are not missing (NaN), 0 otherwise
- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables.
obs (DataArray | Dataset) – Observed variables.
threshold (float) – The main threshold to test closeness against.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the percent within X . All other dimensions will be preserved. Only one of reduce_dims and preserve_dims can be specified.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the percent within X. All other dimensions will be reduced. Only one of reduce_dims and preserve_dims can be specified.
is_angular (bool | None) – If True, uses angular distance in degrees.
decimals (int | None) – A way to avoid floating-point precision issues. Absolute errors are rounded to this many digits before threshold calculations.
is_inclusive (bool | None) – Whether to treat the condition as inclusive (<=) or exclusive (<).
- Returns:
Percent of forecasts within tolerance for each preserved dimension.
- Return type:
DataArray | Dataset
Important
This metric can incentivise hedging (such as introducing biases to maximise one’s score). See the tutorial for more information.
Note
The result is bounded in [0, 100].
NaNs in forecasts or observations are excluded.
If total valid forecast-observation pairs are zero, output is NaN.
Examples
>>> import numpy as np >>> import xarray as xr >>> from scores.standard_impl import percent_within_x >>> obs_raw = np.array( ... [ ... [[1,2,3], [4,5,6]], ... [[3,2,1], [6,5,4]], ... [[3,2,5], [2,2,6]], ... [[5,2,3], [4,-1,4]], ... ] ... ... ) # dimension lengths: x=4, y=2, t=3 >>> obs = xr.DataArray(obs_raw, dims=["x", "y", "t"]) >>> fcst = obs * 1.2 + 0.1 # add some synthetic bias and variance
>>> # Example 1: >>> # percent of forecasts with less than or equal to 0.5 absolute error >>> # reduce over t - time - should produce an xy-grid (4 by 2) >>> percent_within_x(fcst=fcst, obs=obs, threshold=0.5, is_inclusive=True, reduce_dims=["t"]) <xarray.DataArray (x: 4, y: 2)> Size: 64B array([[66.66666667, 0. ], [66.66666667, 0. ], [33.33333333, 66.66666667], [33.33333333, 33.33333333]]) Dimensions without coordinates: x, y
>>> # Example 2: >>> # percent of forecasts with less than or equal to 0.5 absolute error >>> # reduce over (x, y) - space - should be a t-vector (3 by 1) >>> percent_within_x(fcst=fcst, obs=obs, threshold=0.5, is_inclusive=True, reduce_dims=["x","y"]) <xarray.DataArray (t: 3)> Size: 24B array([25., 75., 12.5]) Dimensions without coordinates: t
>>> # Example 3: >>> # percent of forecasts with less than 0.5 absolute error (is_inclusive=False) >>> # reduce over (x, y) - space - should be a t-vector (3 by 1) >>> percent_within_x(fcst=fcst, obs=obs, threshold=0.5, is_inclusive=False, reduce_dims=["x","y"]) <xarray.DataArray (t: 3)> Size: 24B array([12.5., 12.5., 12.5]) Dimensions without coordinates: t
>>> # Example 4: >>> # Controlling floating-point precision issues >>> np.set_printoptions(precision=17) # make floating-point precision issues visible >>> obs = xr.DataArray([0.1 + 0.2], dims=["t"]) >>> print(f'obs {obs.values}') obs [0.30000000000000004] >>> fcst = obs + 0.3 >>> print(f'fcst {fcst.values}') fcst [0.6000000000000001] >>> unrounded = percent_within_x(fcst=fcst, obs=obs, threshold=0.3, is_inclusive=True, decimals=20) >>> print(f'incorrectly creating a penalty: {unrounded.values}') incorrectly ignoring a success: 0.0 >>> rounded = percent_within_x(fcst=fcst, obs=obs, threshold=0.3, is_inclusive=True, decimals=5) >>> print(f'correctly recognising a success: {rounded.values}') correctly recognising a success: 100.0
- scores.continuous.kge(fcst, obs, *, reduce_dims=None, preserve_dims=None, scaling_factors=None, include_components=False)#
Calculate the Kling-Gupta Efficiency (KGE) between observed and simulated (or forecast) values.
KGE is a performance metric that decomposes the error into three components: correlation, variability, and bias. It is computed as:
\[\text{KGE} = 1 - \sqrt{\left[s_\rho \cdot (\rho - 1)\right]^2 + \left[s_\alpha \cdot (\alpha - 1)\right]^2 + \left[s_\beta \cdot (\beta - 1)\right]^2}\]\[\alpha = \frac{\sigma_x}{\sigma_y}\]\[\beta = \frac{\mu_x}{\mu_y}\]- where:
\(\rho\) = Pearson’s correlation coefficient between observed and forecast values as defined in
scores.continuous.correlation.pearsonr()\(\alpha\) is the ratio of the standard deviations (variability ratio)
\(\beta\) is the ratio of the means (bias)
\(x\) and \(y\) are forecast and observed values, respectively
\(\mu_x\) and \(\mu_y\) are the means of forecast and observed values, respectively
\(\sigma_x\) and \(\sigma_y\) are the standard deviations of forecast and observed values, respectively
\(s_\rho\), \(s_\alpha\) and \(s_\beta\) are the scaling factors for the correlation coefficient \(\rho\), the variability term \(\alpha\) and the bias term \(\beta\)
- Parameters:
fcst (DataArray) – Forecast or predicted variables.
obs (DataArray) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the KGE. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the KGE. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be all NaN with the same shape/dimensionality as the forecast because the standard deviation is zero for a single point.
scaling_factors (list[float] | ndarray | None) – A 3-element vector or list describing the weights for each term in the KGE. Defined by: scaling_factors = [\(s_\rho\), \(s_\alpha\), \(s_\beta\)] to apply to the correlation term \(\rho\), the variability term \(\alpha\) and the bias term \(\beta\) respectively. Defaults to (1.0, 1.0, 1.0). (See equation 10 in Gupta et al. (2009) for definitions of them).
include_components (bool | False) – If True, the function also returns the individual terms contributing to the KGE score.
- Returns:
The Kling-Gupta Efficiency (KGE) score as an xarray DataArray.
If
include_componentsis True, the function returnsxarray.Datasetkge_s with the following variables:kge: The KGE score.
rho: The Pearson correlation coefficient.
alpha: The variability ratio.
beta: The bias term.
- Return type:
DataArray | Dataset
Notes
Statistics are calculated only from values for which both observations and simulations are not null values.
This function isn’t set up to take weights.
Currently this function is working only on xr.DataArray.
When preserve_dims is set to ‘all’, the function returns NaN, similar to the Pearson correlation coefficient calculation for a single data point because the standard deviation is zero for a single point.
References
Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modeling. Journal of Hydrology, 377(1-2), 80-91. https://doi.org/10.1016/j.jhydrol.2009.08.003.
Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323-4331. https://doi.org/10.5194/hess-23-4323-2019.
Examples
>>> kge_s = kge(forecasts, obs,preserve_dims='lat') # if data is of dimension {lat,time}, kge value is computed across the time dimension >>> kge_s = kge(forecasts, obs,reduce_dims="time") # if data is of dimension {lat,time}, reduce_dims="time" is same as preserve_dims='lat' >>> kge_s = kge(forecasts, obs, include_components=True) # kge_s is dataset of all three components and kge value itself >>> kge_s_weighted = kge(forecasts, obs, scaling_factors=(0.5, 1.0, 2.0)) # with scaling factors
- scores.continuous.nse(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, is_angular=False)#
The Nash-Sutcliffe model Efficiency coefficient (NSE) is primarily used in hydrology to assess the skill of model predictions (of e.g. “discharge”).
While NSE is often calculated over observations and model predictions in the time dimension, it is actually a fairly generic statistical measure that determines the relative magnitude of the residual variance (“noise”) compared to the measured data variance (“information”) (Nash and Sutcliffe, 1970). Incidentally, it is (inversely) related to the signal-to-noise ratio (SNR).
The general formulation of NSE is as follows:
\[\text{NSE} = 1 - \frac{\sum_i{(O_i - S_i)^2}}{\sum_i{(O_i - \bar{O})^2}}\]- where:
\(i\) is a generic “indexer” representing the set of datapoints along the dimensions being reduced e.g. time (\(t\)) or xy-coordinates (\((x, y)\)). The latter represents reduction over two dimensions as an example.
\(O_i\) is the observation at index \(i\).
\(S_i\) is the “forecast” or model simulation at index \(i\).
\(\bar{O}\) is the mean observation of the set of indexed samples as specified by
reduce_dimsandpreserve_dims.
- Parameters:
fcst (DataArray | Dataset) – “Forecast” or predicted variables.
obs (DataArray | Dataset) – Observed variables.
reduce_dims (Iterable[Hashable] | None) – dimensions to reduce when calculating the NSE. (i.e. NSE will be calculated using datapoints along these dimensions as samples, the other dimensions will be preserved).
preserve_dims (Iterable[Hashable] | None) – dimensions to preserve. Mutually exclusive to
reduce_dims. All dimensions not specified here will be reduced as described inreduce_dims. Note:preserve_dims="all"is not supported for NSE. See notes below.weights (DataArray | Dataset | None) – Optional weighting to apply to the NSE computation. Typically weights are applied over the time dimension but can vary by location as well. Weights must be non-negative and specified for each data point (i.e. the user must not assume broadcasting will handle appropriate assignment of weights for this score).
is_angular (bool) – specifies whether
fcstandobsare angular data (e.g. wind direction). If True, a different function is used to calculate the difference betweenfcstandobs, which accounts for circularity. Angularfcstandobsdata should be in degrees rather than radians.
- Returns:
NSE score for each preserved dimension
xr.Dataset: iffcst,obsand optionallyweightsare all datasets.xr.DataArray: ditto above - where inputs are all dataarraysSee comments below for more information on mixed xarray data types (which this score does not handle) and type isomorphism.
- Raises:
DimensionError – If any dimension checks fail.
KeyError – If no dimensions are being reduced - NSE requries at least 1 dimension to be reduced to compute the observation variance.
IndexError – If the dimensions being reduced only have 1 value - not sufficient for computation of obs variance.
UserWarning – If weights are negative, or invalid (e.g. all zeros or all NaNs).
UserWarning – If attempting to divide by 0. The computation will still succeed but produce
np.nan(numerator is also 0) or-np.infwhere divide by zero would occur.Exception – Any other errors or warnings not otherwise listed due to calculations associated with utility functions such as
gather_dimensions.RuntimeError – If something went wrong with the underlying implementation. The user will be prompted to report this as a github issue.
- Return type:
DataArray | Dataset
- Supplementary details:
Nash-Sutcliffe efficiencies range from -Inf to 1. Essentially, the closer to 1, the more accurate the model is.
NSE = 1, corresponds to a perfect match of the model to the obs.
NSE = 0, indicates that the model is as accurate as the mean obs.
-Inf < NSE < 0, indicates that the mean obs is better predictor than the model.
The optional
weightsargument can additionally be used to perform a weighted NSE (wNSE). Although, this is a generic set of weights, and it is the user’s responsiblility to define them appropriately. Typically this is the observation itself (Hundecha, Y., & Bárdossy, A., 2004).weightsmust be non-negative. Therefore, the observations must ideally also be non-negative (or formulated appropriately) if used as weights.While
is_angularis typically not used for this score, NSE is generic enough that it _could_ be used in wider context, and hence is kept as an option. It is defaulted toFalseas that’s the typical use-case.
Important
This score does not allow mixed xarray data structures as inputs. Either provide all
xr.DataArrayor allxr.Datasetexclusively, for thefcst,obsand (optionally)weightsarguments.This is an intentionally imposed constraint to make sure the inner computations are simple to check and deterministic. See tips below for more information.
Warning
Operations between dataarrays are not guarenteed to preserve names. If the user is working with dataarrays, it is assumed that preserving names is not a major requirement. If a user needs the name preserved, they should explicitly convert all data array inputs to datasets using
xr.DataArray.to_dataset(...), and verify that the naming is retained appropriately before calling the score.For operations where ONLY
xr.DataArrayinputs are used, the returned score will have its name forced to the name of this score i.e. “NSE”, for simplicity.See tips below for more information.
Note
For Hydrology in particular \(i = t\), the reduced dimension is usually the time dimension. However, in order to keep things generic, this function does not explicitly mandate a time dimension be provided. Instead it requires it has a hard requirement that at least one dimension is being reduced from either a specification of
reduce_dimsorpreserve_dims(mutually exclusive).The reason is that the observation variance cannot be non-zero if nothing is being reduced.
As a side-effect of the above requirement,
preserve_dims="all"is not allowed and will naturally throw an error.Note
Divide by zero is allowed - to accomodate scenarios where all obs entries in the group being reduced is constant (0 obs variance).
While these may cause divide by zero errors, they should not halt execution of computations for other valid coordinates - so a warning is issued instead to prompt the user to double check the data.
It may also be that divide by zero is unavoidable - in which case we still want to return the correctly calculated values. To this end, this is how
numpyresolves divide by zero:# note: psuedocode n / 0 = NaN : if n == 0 (represented by np.nan) = -Inf : if n == 0 (represented by -np.inf)
Tip
When dealing with dask arrays dask, no computation will happen until
.compute(...)is called on the returned score.Tip
Work with datasets where possible with NSE, or for any score that supports datasets for that matter. Datasets maintain structural integrity better than their dataarray counterparts and also are compatible with higher order types like
xr.DataTree.Operations between datasets are more predictable than operations with mixed types. Data arrays on the other hand may ignore names and broadcast liberally even when names do not match, and this may not be consistent depending on the operation. This may or may not be the intented behaviour the user expects. Operations between only dataarrays are fine as long as preserving names is not mandatory.
Examples
>>> import numpy as np >>> import xarray as xr >>> from scores.continuous import nse >>> obs_raw = np.array( ... [ ... [[1,2,3], [4,5,6]], ... [[3,2,1], [6,5,4]], ... [[3,2,5], [2,2,6]], ... [[5,2,3], [4,-1,4]], ... ] ... ... ) # dimension lengths: x=4, y=2, t=3 >>> obs = xr.DataArray(obs_raw, dims=["x", "y", "t"]) >>> fcst = obs * 1.2 + 0.1 # add some synthetic bias and variance >>> # Example 1: >>> # reduce over t - time - should produce a xy-grid (4 by 2) >>> nse(obs, fcst, reduce_dims=["t"]) <xarray.DataArray (x: 4, y: 2)> Size: 64B array([[ 0.71180556, -0.28819444], [ 0.71180556, -0.28819444], [ 0.70982143, 0.85742188], [ 0.70982143, 0.93208333]]) Dimensions without coordinates: x, y >>> # Example 2: >>> # reduce over (x, y) - space - should be a t-vector (3 by 1) >>> nse(obs, fcst, reduce_dims=["x", "y"]) <xarray.DataArray (t: 3)> Size: 24B array([0.77469136, 0.90123457, 0.74722222]) Dimensions without coordinates: t
References
Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I — A discussion of principles. In Journal of Hydrology (Vol. 10, Issue 3, pp. 282– 290). Elsevier BV. https://doi.org/10.1016/0022-1694%2870%2990255-6
Hundecha, Y., & Bárdossy, A. (2004). Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. Journal of Hydrology, 292(1-4), 281-295. https://doi.org/10.1016/j.jhydrol.2004.01.002
- scores.continuous.isotonic_fit(fcst, obs, *, weight=None, functional='mean', bootstraps=None, quantile_level=None, solver=None, confidence_level=0.9, min_non_nan=1, report_bootstrap_results=False)#
Calculates an isotonic regression fit to observations given forecasts as the explanatory values. Optionally returns confidence bands on the fit via bootstrapping. Forecasts and observations are scalar-valued (i.e, integers, floats or NaN).
If forecasts target the mean or a quantile functional, then the user can select that functional. Otherwise the user needs to supply a solver function
of one variable (if no weights are supplied), or
of two variables (if weights are supplied).
The solver function is used to find an optimal regression fit in each block, subject to a non-decreasing monotonicity constraint, using the pool adjacent violators (PAV) algorithm.
An example of a solver function of one variable is obs -> np.mean(obs) An example of a solver function of two variables is (obs, weight) -> np.average(obs, weights=weight).
Ties (where one forecast value has more than one observation value) are handled as follows. Forecasts are sorted in ascending order, and observations sorted to match. Where ties occur, observations are sorted in descending order within each subset of ties. Isotonic regression is then performed on this sorted observation sequence.
This implementation uses sklearn.isotonic.IsotonicRegression when functional=”mean”.
- Parameters:
fcst (ndarray | DataArray) – np.array or xr.DataArray of forecast (or explanatory) values. Values must be float, integer or NaN.
obs (ndarray | DataArray) – np.array or xr.DataArray of corresponding observation (or response) values. Must be the same shape as fcst. Values must be float, integer or NaN.
weight (ndarray | DataArray | None) – positive weights to apply to each forecast-observation pair. Must be the same shape as fcst, or None (which is equivalent to applying equal weights).
functional (Literal['mean', 'quantile'] | None) – Functional that the forecasts are targeting. Either “mean” or “quantile” or None. If None then solver must be supplied. The current implementation for “quantile” does not accept weights. If weighted quantile regression is desired then the user should should supply an appropriate solver.
bootstraps (int | None) – the number of bootstrap samples to perform for calculating the regression confidence band. Set to None if a confidence band is not required.
quantile_level (float | None) – the level of the quantile functional if functional=’quantile’. Must be strictly between 0 and 1.
solver (Callable[[ndarray, ndarray], float] | Callable[[ndarray], float] | None) – function that accepts 1D numpy array of observations and returns a float. Function values give the regression fit for each block as determined by the PAV algorithm. If weights are supplied, the function must have a second argument that accepts 1D numpy array of weights of same length as the observations.
confidence_level (float | None) – Confidence level of the confidence band, strictly between 0 and 1. For example, a confidence level of 0.5 will calculate the confidence band by computing the 25th and 75th percentiles of the bootstrapped samples.
min_non_nan (int | None) – The minimum number of non-NaN bootstrapped values to calculate confidence bands at a particular forecast value.
report_bootstrap_results (bool | None) – This specifies to whether keep the bootstrapped regression values in return dictionary. Default is False to keep the output result small.
- Returns:
- “unique_fcst_sorted”: 1D numpy array of remaining forecast values sorted in
ascending order, after any NaNs from fcst, obs and weight are removed, and only unique values are kept to keep the output size reasonably small.
”fcst_counts”: 1D numpy array of forecast counts for unique values of forecast sorted
- ”regression_values”: 1D numpy array of regression values corresponding to
”unique_fcst_sorted” values.
- ”regression_func”: function that returns the regression fit based on linear
interpolation of (“fcst_sorted”, “regression_values”), for any supplied argument (1D numpy array) of potential forecast values.
- ”bootstrap_results”: in the case of report_bootstrap_results=True, 2D numpy
array of bootstrapped regression values is included in return dictionary. Each row gives the interpolated regression values from a particular bootstrapped sample, evaluated at “fcst_sorted” values. If m is the number of bootstrap samples and n = len(fcst_sorted) then it is has shape (m, n). We emphasise that this array corresponds to fcst_sorted not unique_fcst_sorted.
- ”confidence_band_lower_values”: values of lower confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_upper_values”: values of upper confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_lower_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_lower_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_upper_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_upper_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_levels”: tuple giving the quantile levels used to calculate the
confidence band.
- Return type:
Dictionary with the following keys
- Raises:
ValueError – if fcst and obs are np.arrays and don’t have the same shape.
ValueError – if fcst and weight are np.arrays and don’t have the same shape.
ValueError – if fcst and obs are xarray.DataArrays and don’t have the same dimensions.
ValueError – if fcst and weight are xarray.DataArrays and don’t have the same dimensions.
ValueError – if any entries of fcst, obs or weight are not an integer, float or NaN.
ValueError – if there are no fcst and obs pairs after NaNs are removed.
ValueError – if functional is not one of “mean”, “quantile” or None.
ValueError – if quantile_level is not strictly between 0 and 1 when functional=”quantile”.
ValueError – if weight is not None when functional=”quantile”.
ValueError – if functional and solver are both None.
ValueError – if not exactly one of functional and solver is None.
ValueError – if bootstraps is not a positive integer.
ValueError – if confidence_level is not strictly between 0 and 1.
- Note: This function only keeps the unique values of fcst_sorted to keep the volume of
the return dictionary small. The forecast counts is also included, so users can it to create forecast histogram (usually displayed in the reliability diagrams).
- References
de Leeuw, Hornik and Mair. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods”, Journal of Statistical Software, 2009.
Dimitriadis, Gneiting and Jordan. “Stable reliability diagrams for probabilistic classifiers”, PNAS, Vol. 118 No. 8, 2020. Available at https://www.pnas.org/doi/10.1073/pnas.2016191118
Jordan, Mühlemann, and Ziegel. “Optimal solutions to the isotonic regression problem”, 2020 (version 2), available on arxiv at https://arxiv.org/abs/1904.04761
- scores.continuous.consistent_expectile_score(fcst, obs, alpha, phi, phi_prime, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the score using a scoring function that is consistent for the alpha-expectile functional, based on a supplied convex function phi. See Geniting (2011), or Equation (10) from Taggart (2022).
\[\begin{split}S(x, y) = \begin{cases} (1 - \alpha)(\phi(y) - \phi(x) - \phi'(x)(y-x)), & y < x \\ \alpha(\phi(y) - \phi(x) - \phi'(x)(y-x)), & x \leq y \end{cases}\end{split}\]- where
\(x\) is the forecast
\(y\) is the observation
\(\alpha\) is the expectile level
\(\phi\) is a convex function of a single variable
\(\phi'\) is the subderivative of \(\phi\)
\(S(x,y)\) is the score.
Note that if \(\phi\) is differentiable then \(\phi'\) is its derivative.
The score is negatively oriented with \(0 \leq S(x,y) < \infty\).
We strongly recommend that you work through the tutorial https://scores.readthedocs.io/en/stable/tutorials/Consistent_Scores.html to help you understand how to use the consistent scoring functions in
scores.- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
alpha (float) – expectile level. Must be strictly between 0 and 1.
phi (Callable[[DataArray], DataArray]) – a convex function on the real numbers, accepting a single array like argument.
phi_prime (Callable[[DataArray], DataArray]) – a subderivative of phi, accepting a single array like argument.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the consistent expectile score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the consistent quantile score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the consistent quantile score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
array of (mean) scores that is consistent for alpha-expectile functional, with the dimensions specified by dims. If dims is None, the returned DataArray will have only one entry, the overall mean score.
- Raises:
ValueError – if alpha is not strictly between 0 and 1.
- Return type:
DataArray
References
Gneiting, T. (2011). Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494), 746–762. https://doi.org/10.1198/jasa.2011.r10138
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306–320. https://doi.org/10.1002/qj.4206
- scores.continuous.consistent_quantile_score(fcst, obs, alpha, g, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the score that is consistent for the alpha-quantile functional, based on nondecreasing function g. See Gneiting (2011), or Equation (8) from Taggart (2022).
\[\begin{split}S(x, y) = \begin{cases} (1 - \alpha)(g(x) - g(y)), & y < x \\ \alpha(g(y) - g(x)), & x \leq y \end{cases}\end{split}\]- where
\(x\) is the forecast
\(y\) is the observation
\(\alpha\) is the quantile level
\(g\) is a nondecreasing function of a single variable
\(S(x,y)\) is the score.
The score is negatively oriented with \(0 \leq S(x,y) < \infty\).
We strongly recommend that you work through the tutorial https://scores.readthedocs.io/en/stable/tutorials/Consistent_Scores.html to help you understand how to use the consistent scoring functions in
scores.- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
alpha (float) – quantile level. Must be strictly between 0 and 1.
g (Callable[[DataArray], DataArray]) – nondecreasing function on the real numbers, accepting a single array like argument.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the consistent quantile score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the consistent quantile score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the consistent quantile score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
array of (mean) scores that are consistent for alpha-quantile functional, with the dimensions specified by dims. If dims is None, the returned DataArray will have only one entry, the overall mean score.
- Raises:
ValueError – if alpha is not strictly between 0 and 1.
- Return type:
DataArray
References
Gneiting, T. (2011). Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494), 746–762. https://doi.org/10.1198/jasa.2011.r10138
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306–320. https://doi.org/10.1002/qj.4206
- scores.continuous.consistent_huber_score(fcst, obs, huber_param, phi, phi_prime, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the score that is consistent for the Huber mean functional with tuning parameter tuning_param, based on convex function phi. See Taggart (2022a), or Equation (11) from Taggart (2022b). See Taggart (2022b), end of Section 3.4, for the standard formula.
\[S(x,y) = \frac{1}{2}(\phi(y)-\phi(\kappa_v (x-y) + y) + \kappa_v (x-y)\phi'(x))\]- where
\(\kappa_v(x) = \mathrm{max}(-v, \mathrm{min}(x, v))\) is the “capping function”
\(v\) is the Huber parameter
\(x\) is the forecast
\(y\) is the observation
\(\phi\) is a convex function of a single variable
\(\phi'\) is the subderivative of \(\phi\)
\(S(x,y)\) is the score.
The score is negatively oriented with \(0 \leq S(x,y) < \infty\).
We strongly recommend that you work through the tutorial https://scores.readthedocs.io/en/stable/tutorials/Consistent_Scores.html to help you understand how to use the consistent scoring functions in
scores.- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
huber_param (float) – Huber mean tuning parameter. This corresponds to the transition point between linear and quadratic loss for Huber loss. Must be positive.
phi (Callable[[DataArray], DataArray]) – a convex function on the real numbers, accepting a single array like argument.
phi_prime (Callable[[DataArray], DataArray]) – a subderivative of phi, accepting a single array like argument.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the consistent Huber score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the consistent Huber score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the consistent Huber score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
array of (mean) scores that is consistent for Huber mean functional, with the dimensions specified by dims. If dims is None, the returned DataArray will have only one entry, the overall mean score.
- Raises:
ValueError – if huber_param <= 0.
- Return type:
DataArray
References
Taggart, R. J. (2022a). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-ejs1957
Taggart, R. (2022b). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306–320. https://doi.org/10.1002/qj.4206
- scores.continuous.tw_quantile_score(fcst, obs, alpha, interval_where_one, *, interval_where_positive=None, reduce_dims=None, preserve_dims=None, weights=None)#
Returns the threshold weighted quantile score.
For more flexible threshold weighting schemes, see
scores.continuous.consistent_quantile_score().- Two types of threshold weighting are supported: rectangular and trapezoidal.
To specify a rectangular threshold weight, set
interval_where_positive=Noneand setinterval_where_oneto be the interval where the threshold weight is 1. For example, ifinterval_where_one=(0, 10)then a threshold weight of 1 is applied to decision thresholds satisfying 0 <= threshold < 10, and a threshold weight of 0 is applied otherwise. Interval endpoints can be-numpy.infornumpy.inf.To specify a trapezoidal threshold weight, specify
interval_where_positiveandinterval_where_oneusing desired endpoints. For example, ifinterval_where_positive=(-2, 10)andinterval_where_one=(2, 4)then a threshold weight of 1 is applied to decision thresholds satisfying 2 <= threshold < 4. The threshold weight increases linearly from 0 to 1 on the interval [-2, 2) and decreases linearly from 1 to 0 on the interval [4, 10], and is 0 otherwise. Interval endpoints can only be infinite if the correspondinginterval_where_oneendpoint is infinite. End points ofinterval_where_positiveandinterval_where_onemust differ except when the endpoints are infinite.
- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
alpha (float) – the quantile level. Must be strictly between 0 and 1.
interval_where_one (Tuple[int | float | DataArray, int | float | DataArray]) – endpoints of the interval where the threshold weights are 1. Must be increasing. Infinite endpoints are permissible. By supplying a tuple of arrays, endpoints can vary with dimension.
interval_where_positive (Tuple[int | float | DataArray, int | float | DataArray] | None) – endpoints of the interval where the threshold weights are positive. Must be increasing. Infinite endpoints are only permissible when the corresponding
interval_where_oneendpoint is infinite. By supplying a tuple of arrays, endpoints can vary with dimension.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the threshold_weighted_quantile_score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the threshold_weighted_quantile_score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the threshold_weighted_quantile_score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
xarray data array of the threshold weighted quantile error
- Raises:
ValueError – if
interval_where_oneis not length 2.ValueError – if
interval_where_oneis not increasing.ValueError – if
alphais not in the open interval (0,1).ValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneandinterval_where_positivedo not specify a valid trapezoidal weight.
- Return type:
DataArray
References
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306-320. https://doi.org/10.1002/qj.4206
- scores.continuous.tw_absolute_error(fcst, obs, interval_where_one, *, interval_where_positive=None, reduce_dims=None, preserve_dims=None, weights=None)#
Returns the threshold weighted absolute error.
For more flexible threshold weighting schemes, see
scores.continuous.consistent_quantile_score().- Two types of threshold weighting are supported: rectangular and trapezoidal.
To specify a rectangular threshold weight, set
interval_where_positive=Noneand setinterval_where_oneto be the interval where the threshold weight is 1. For example, ifinterval_where_one=(0, 10)then a threshold weight of 1 is applied to decision thresholds satisfying 0 <= threshold < 10, and a threshold weight of 0 is applied otherwise. Interval endpoints can be-numpy.infornumpy.inf.To specify a trapezoidal threshold weight, specify
interval_where_positiveandinterval_where_oneusing desired endpoints. For example, ifinterval_where_positive=(-2, 10)andinterval_where_one=(2, 4)then a threshold weight of 1 is applied to decision thresholds satisfying 2 <= threshold < 4. The threshold weight increases linearly from 0 to 1 on the interval [-2, 2) and decreases linearly from 1 to 0 on the interval [4, 10], and is 0 otherwise. Interval endpoints can only be infinite if the correspondinginterval_where_oneendpoint is infinite. End points ofinterval_where_positiveandinterval_where_onemust differ except when the endpoints are infinite.
- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
interval_where_one (Tuple[int | float | DataArray, int | float | DataArray]) – endpoints of the interval where the threshold weights are 1. Must be increasing. Infinite endpoints are permissible. By supplying a tuple of arrays, endpoints can vary with dimension.
interval_where_positive (Tuple[int | float | DataArray, int | float | DataArray] | None) – endpoints of the interval where the threshold weights are positive. Must be increasing. Infinite endpoints are only permissible when the corresponding
interval_where_oneendpoint is infinite. By supplying a tuple of arrays, endpoints can vary with dimension.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the threshold_weighted_absolute_error. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the threshold_weighted_absolute_error. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the threshold_weighted_absolute_error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
xarray data array of the threshold weighted absolute error
- Raises:
ValueError – if
interval_where_oneis not length 2.ValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneandinterval_where_positivedo not specify a valid trapezoidal weight.
- Return type:
DataArray
References
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306-320. https://doi.org/10.1002/qj.4206
- scores.continuous.tw_squared_error(fcst, obs, interval_where_one, *, interval_where_positive=None, reduce_dims=None, preserve_dims=None, weights=None)#
Returns the the threshold weighted squared error.
For more flexible threshold weighting schemes, see
scores.continuous.consistent_expectile_score().- Two types of threshold weighting are supported: rectangular and trapezoidal.
To specify a rectangular threshold weight, set
interval_where_positive=Noneand setinterval_where_oneto be the interval where the threshold weight is 1. For example, ifinterval_where_one=(0, 10)then a threshold weight of 1 is applied to decision thresholds satisfying 0 <= threshold < 10, and a threshold weight of 0 is applied otherwise. Interval endpoints can be-numpy.infornumpy.inf.To specify a trapezoidal threshold weight, specify
interval_where_positiveandinterval_where_oneusing desired endpoints. For example, ifinterval_where_positive=(-2, 10)andinterval_where_one=(2, 4)then a threshold weight of 1 is applied to decision thresholds satisfying 2 <= threshold < 4. The threshold weight increases linearly from 0 to 1 on the interval [-2, 2) and decreases linearly from 1 to 0 on the interval [4, 10], and is 0 otherwise. Interval endpoints can only be infinite if the correspondinginterval_where_oneendpoint is infinite. End points ofinterval_where_positiveandinterval_where_onemust differ except when the endpoints are infinite.
- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
interval_where_one (Tuple[int | float | DataArray, int | float | DataArray]) – endpoints of the interval where the threshold weights are 1. Must be increasing. Infinite endpoints are permissible. By supplying a tuple of arrays, endpoints can vary with dimension.
interval_where_positive (Tuple[int | float | DataArray, int | float | DataArray] | None) – endpoints of the interval where the threshold weights are positive. Must be increasing. Infinite endpoints are only permissible when the corresponding
interval_where_oneendpoint is infinite. By supplying a tuple of arrays, endpoints can vary with dimension.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the threshold_weighted_square_error. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the threshold_weighted_squared_error. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the threshold_weighted_squared_error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
xarray data array of the threshold weighted squared error
- Raises:
ValueError – if
interval_where_oneis not length 2.ValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneandinterval_where_positivedo not specify a valid trapezoidal weight.
- Return type:
DataArray
- Reference:
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306-320. https://doi.org/10.1002/qj.4206
- scores.continuous.tw_huber_loss(fcst, obs, huber_param, interval_where_one, *, interval_where_positive=None, reduce_dims=None, preserve_dims=None, weights=None)#
Returns the threshold weighted Huber loss.
For more flexible threshold weighting schemes, see
scores.continuous.consistent_huber_score().- Two types of threshold weighting are supported: rectangular and trapezoidal.
To specify a rectangular threshold weight, set
interval_where_positive=Noneand setinterval_where_oneto be the interval where the threshold weight is 1. For example, ifinterval_where_one=(0, 10)then a threshold weight of 1 is applied to decision thresholds satisfying 0 <= threshold < 10, and a threshold weight of 0 is applied otherwise. Interval endpoints can be-numpy.infornumpy.inf.To specify a trapezoidal threshold weight, specify
interval_where_positiveandinterval_where_oneusing desired endpoints. For example, ifinterval_where_positive=(-2, 10)andinterval_where_one=(2, 4)then a threshold weight of 1 is applied to decision thresholds satisfying 2 <= threshold < 4. The threshold weight increases linearly from 0 to 1 on the interval [-2, 2) and decreases linearly from 1 to 0 on the interval [4, 10], and is 0 otherwise. Interval endpoints can only be infinite if the correspondinginterval_where_oneendpoint is infinite. End points ofinterval_where_positiveandinterval_where_onemust differ except when the endpoints are infinite.
- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
huber_param (float) – the Huber transition parameter.
interval_where_one (Tuple[int | float | DataArray, int | float | DataArray]) – endpoints of the interval where the threshold weights are 1. Must be increasing. Infinite endpoints are permissible. By supplying a tuple of arrays, endpoints can vary with dimension.
interval_where_positive (Tuple[int | float | DataArray, int | float | DataArray] | None) – endpoints of the interval where the threshold weights are positive. Must be increasing. Infinite endpoints are only permissible when the corresponding
interval_where_oneendpoint is infinite. By supplying a tuple of arrays, endpoints can vary with dimension.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the threshold_weighted_expectile_score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the threshold_weighted_expectile_score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the threshold_weighted_expectile_score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
xarray data array of the threshold weighted expectile error
- Raises:
ValueError – if
interval_where_oneis not length 2.ValueError – if
interval_where_oneis not increasing.ValueError – if
alphais not in the open interval (0,1).ValueError – if
huber_paramis not positiveValueError – if
interval_where_oneis not increasing.ValueError – if
interval_where_oneandinterval_where_positivedo not specify a valid trapezoidal threshold weight.
- Return type:
DataArray
References
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306-320. https://doi.org/10.1002/qj.4206
- scores.continuous.tw_expectile_score(fcst, obs, alpha, interval_where_one, *, interval_where_positive=None, reduce_dims=None, preserve_dims=None, weights=None)#
Returns the threshold weighted expectile score.
For more flexible threshold weighting schemes, see
scores.continuous.consistent_expectile_score().- Two types of threshold weighting are supported: rectangular and trapezoidal.
To specify a rectangular threshold weight, set
interval_where_positive=Noneand setinterval_where_oneto be the interval where the threshold weight is 1. For example, ifinterval_where_one=(0, 10)then a threshold weight of 1 is applied to decision thresholds satisfying 0 <= threshold < 10, and a threshold weight of 0 is applied otherwise. Interval endpoints can be-numpy.infornumpy.inf.To specify a trapezoidal threshold weight, specify
interval_where_positiveandinterval_where_oneusing desired endpoints. For example, ifinterval_where_positive=(-2, 10)andinterval_where_one=(2, 4)then a threshold weight of 1 is applied to decision thresholds satisfying 2 <= threshold < 4. The threshold weight increases linearly from 0 to 1 on the interval [-2, 2) and decreases linearly from 1 to 0 on the interval [4, 10], and is 0 otherwise. Interval endpoints can only be infinite if the correspondinginterval_where_oneendpoint is infinite. End points ofinterval_where_positiveandinterval_where_onemust differ except when the endpoints are infinite.
- Parameters:
fcst (DataArray) – array of forecast values.
obs (DataArray) – array of corresponding observation values.
alpha (float) – expectile level. Must be strictly between 0 and 1.
interval_where_one (Tuple[int | float | DataArray, int | float | DataArray]) – endpoints of the interval where the threshold weights are 1. Must be increasing. Infinite endpoints are permissible. By supplying a tuple of arrays, endpoints can vary with dimension.
interval_where_positive (Tuple[int | float | DataArray, int | float | DataArray] | None) – endpoints of the interval where the threshold weights are positive. Must be increasing. Infinite endpoints are only permissible when the corresponding
interval_where_oneendpoint is infinite. By supplying a tuple of arrays, endpoints can vary with dimension.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the threshold_weighted_expectile_score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the threshold_weighted_expectile_score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the threshold_weighted_expectile_score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
xarray data array of the threshold weighted expectile error
- Raises:
ValueError – if
interval_where_oneis not length 2.ValueError – if
interval_where_oneis not increasing.ValueError – if
alphais not in the open interval (0,1).ValueError – if
interval_where_oneandinterval_where_positivedo not specify a valid trapezoidal weight.
- Return type:
DataArray
References
Taggart, R. (2022). Evaluation of point forecasts for extreme events using consistent scoring functions. Quarterly Journal of the Royal Meteorological Society, 148(742), 306-320. https://doi.org/10.1002/qj.4206
- scores.continuous.quantile_interval_score(fcst_lower_qtile, fcst_upper_qtile, obs, lower_qtile_level, upper_qtile_level, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the quantile interval score for interval forecasts. This score penalises the interval’s width and whether the observation value lies outside the interval.
\[\text{quantile interval score} = \underbrace{(q_{u} - q_{l})}_{\text{interval width penalty}} + \underbrace{\frac{1}{\alpha_l} \cdot (q_{l} - y) \cdot \mathbb{1}(y < q_{l})}_{\text{over-prediction penalty}} + \underbrace{\frac{1}{1 - \alpha_u} \cdot (y - q_{u}) \cdot \mathbb{1}(y > q_{u})}_{\text{under-prediction penalty}}\]- where
\(q_u\) is the forecast at the upper quantile
\(q_l\) is the forecast at the lower quantile
\(\alpha_u\) is the upper quantile level
\(\alpha_l\) is the lower quantile level
\(y\) is the observation
\(\mathbb{1}(condition)\) is an indicator function that is 1 when the condition is true, and 0 otherwise.
- Parameters:
fcst_lower_qtile (DataArray) – array of forecast values at the lower quantile.
fcst_upper_qtile (DataArray) – array of forecast values at the upper quantile.
obs (DataArray) – array of observations.
lower_qtile_level (float) – Quantile level between 0 and 1 (exclusive) for
fcst_lower_qtile.upper_qtile_level (float) – Quantile level between 0 and 1 (exclusive) for
fcst_upper_qtile.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the quantile interval score. All other dimensions will be preserved. As a special case, “all” will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the quantile interval score. All other dimensions will be reduced. As a special case, “all” will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the quantile score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
A Dataset with the dimensions specified in
dims. The dataset will have the following data variables:interval_width_penalty: the interval width penalty contribution of the quantile interval score
overprediction_penalty: the over-prediction penalty contribution of the quantile interval score
underprediction_penalty: the under-prediction penalty contribution of the quantile interval score
total: sum of all penalties
- Raises:
ValueError – If not (0 < lower_qtile_level < upper_qtile_level < 1).
ValueError – If (fcst_lower_qtile > fcst_upper_qtile).any().
- Return type:
Dataset
References
Winkler, R. L. (1972). A Decision-Theoretic Approach to Interval Estimation. Journal of the American Statistical Association, 67(337), 187. https://doi.org/10.2307/2284720
Examples
Calculate the quantile interval score for forecast intervals with lower and upper quantile levels of 0.1 and 0.6, respectively.
>>> import numpy as np >>> import xarray as xr >>> from scores.continuous import quantile_interval_score >>> fcst_lower_level = xr.DataArray(np.random.uniform(10, 15, size=(30, 15)), dims=['time', 'station']) >>> fcst_upper_level = xr.DataArray(np.random.uniform(20, 25, size=(30, 15)), dims=['time', 'station']) >>> obs = xr.DataArray(np.random.uniform(8, 27,size=(30, 15)), dims=['time', 'station']) >>> quantile_interval_score(fcst_lower_level, fcst_upper_level, obs, 0.1, 0.6)
- scores.continuous.interval_score(fcst_lower_qtile, fcst_upper_qtile, obs, interval_range, *, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the interval score for interval forecasts. This function calls the
scores.continuous.quantile_interval_score()function to calculate the interval score for cases where the quantile level range is symmetric.\[\text{interval score} = \underbrace{(q_{u} - q_{l})}_{\text{interval width penalty}} + \underbrace{\frac{2}{\alpha} \cdot (q_{l} - y) \cdot \mathbb{1}(y < q_{l})}_{\text{over-prediction penalty}} + \underbrace{\frac{2}{\alpha} \cdot (y - q_{u}) \cdot \mathbb{1}(y < q_{u})}_{\text{under-prediction penalty}}\]- where
\(q_u\) is the forecast at the upper quantile
\(q_l\) is the forecast at the lower quantile
\(\alpha\) is the confidence level that is equal to \(1 - \text{interval_range}\)
\(y\) is the observation
\(\mathbb{1}(condition)\) is an indicator function that is 1 when the condition is true, and 0 otherwise.
- Parameters:
fcst_lower_qtile (DataArray) – array of forecast values at the lower quantile.
fcst_upper_qtile (DataArray) – array of forecast values at the upper quantile.
obs (DataArray) – array of observations.
interval_range (float) – Range (length) of interval (e.g., 0.9 for 90% confidence level, which will result in lower quantile of 0.05 and upper quantile of 0.95). Must be strictly between 0 and 1.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the interval score. All other dimensions will be preserved. As a special case, “all” will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the interval score. All other dimensions will be reduced. As a special case, “all” will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the quantile score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
A Dataset with the dimensions specified in
dims. The dataset will have the following data variables:interval_width_penalty: the interval width penalty contribution of the interval score
overprediction_penalty: the over-prediction penalty contribution of the interval score
underprediction_penalty: the under-prediction penalty contribution of the interval score
total: sum of all penalties
As can be seen in the interval score equation, the lower and upper quantile levels are derived from the interval range:
lower_qtile_level = (1 - interval_range) / 2andupper_qtile_level = (1 + interval_range) / 2.- Raises:
ValueError – If not 0 < interval_range < 1.
ValueError – If (fcst_lower_qtile > fcst_upper_qtile).any().
- Return type:
Dataset
References
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378. Section 6.2. https://doi.org/10.1198/016214506000001437
Examples
Calculate the interval score for forecast intervals with an interval range of 0.5 (i.e., lower and upper quantile levels are 0.25 and 0.75, respectively).
>>> import numpy as np >>> import xarray as xr >>> from scores.continuous import interval_score >>> fcst_lower_level = xr.DataArray(np.random.uniform(10, 15, size=(30, 15)), dims=['time', 'station']) >>> fcst_upper_level = xr.DataArray(np.random.uniform(20, 25, size=(30, 15)), dims=['time', 'station']) >>> obs = xr.DataArray(np.random.uniform(8, 27,size=(30, 15)), dims=['time', 'station']) >>> interval_score(fcst_lower_level, fcst_upper_level, obs, 0.5)
- scores.continuous.qq(fcst, obs, *, quantiles, interpolation_method='linear', data_source_dim='data_source', reduce_dims=None, preserve_dims=None)#
Calculate the quantiles of empirical forecast and observation data for a quantile-quantile (Q-Q) plot.
Forecast and observation data are broadcast and NaN values are matched before calculating the quantiles. This function calculates the quantiles for the user to plot using their preferred plotting library.
Currently this function is implemented for working with raw forecast and observation data rather than a theoretical probability distribution.
- Parameters:
fcst (DataArray | Dataset) – Forecast data.
obs (DataArray | Dataset) – Observation data.
quantiles (Iterable[float]) – Quantiles to compute, e.g., [0.1, 0.5, 0.9].
interpolation_method (str) – Interpolation method for estimating the quantile value, default is ‘linear’. Valid options include ‘inverted_cdf’, ‘averaged_inverted_cdf’, ‘closest_observation’, ‘interpolated_inverted_cdf’, ‘hazen’, ‘weibull’, ‘linear’, ‘median_unbiased’, ‘normal_unbiased’, ‘lower’, ‘higher’, ‘midpoint’, and ‘nearest’. See
numpy.quantile()for more information on interpolation method options.data_source_dim (str) – Name of the new dimension to be created for the data source, default is ‘data_source’. This dimension will contain two values: ‘fcst’ and ‘obs’, representing the forecast and observation data, respectively.
reduce_dims (Iterable[Hashable] | None) – Dimensions to reduce over when calculating quantiles.
preserve_dims (Iterable[Hashable] | None) – Dimensions to preserve when calculating quantiles.
- Returns:
An xarray object with a new dimension called “data_source”.
- Raises:
ValueError – If the interpolation method is invalid.
ValueError – if quantiles are not between 0 and 1.
ValueError – If a dimension in the input data has the same name as data_source_dim
ValueError – If fcst and obs are Datasets with different data variables.
ValueError – If a user tries to preserve all dimensions.
TypeError – If fcst and obs are not both xarray DataArrays or Datasets.
- Return type:
DataArray | Dataset
References
Déqué, M. (2011). Deterministic forecasts of continuous variables. In I. T. Jolliffe & D. B. Stephenson (Eds.), Forecast verification: A practitioner’s guide in atmospheric science (2nd ed., pp. 77–94). Wiley. https://doi.org/10.1002/9781119960003.ch5
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.plotdata import qq >>> fcst = xr.DataArray(np.random.rand(100), dims='time') >>> obs = xr.DataArray(np.random.rand(100), dims='time') >>> result = qq(fcst, obs, quantiles=[0.1, 0.5, 0.9])
scores.probability#
- scores.probability.crps_cdf(fcst, obs, *, threshold_dim='threshold', threshold_weight=None, additional_thresholds=None, propagate_nans=True, fcst_fill_method='linear', threshold_weight_fill_method='forward', integration_method='exact', reduce_dims=None, preserve_dims=None, weights=None, include_components=False)#
Calculates the CRPS probabilistic metric given CDF input.
Calculates the continuous ranked probability score (CRPS), or the mean CRPS over specified dimensions, given forecasts in the form of predictive cumulative distribution functions (CDFs). Can also calculate threshold-weighted versions of the CRPS by supplying a threshold_weight.
Predictive CDFs here are described by an indexed set of values rather than by closed forumulae. As a result, the resolution or number of points at which the CDF is realised has an impact on the calculation of areas under and over the curve to obtain the CRPS result. The term ‘threshold’ is used to describe the dimension which is used as an index for predictive CDF values. Various techniques are used to interpolate CDF values between indexed thresholds.
- Given:
a predictive CDF fcst indexed at thresholds by variable x
an observation in CDF form obs_cdf (i.e., \(\text{obs_cdf}(x) = 0\) if \(x < \text{obs}\) and 1 if \(x >= \text{obs}\))
a threshold_weight array indexed by variable x
- The threshold-weighted CRPS is given by:
\(twCRPS = \int_{-\infty}^{\infty}{[\text{threshold_weight}(x) \times (\text{fcst}(x) - \text{obs_cdf}(x))^2]\text{d}x}\), over all thresholds x.
The usual CRPS is the threshold-weighted CRPS with \(\text{threshold_weight}(x) = 1\) for all x.
- This can be decomposed into an over-forecast penalty:
\(\int_{-\infty}^{\infty}{[\text{threshold_weight}(x) \times \text{fcst}(x) - \text{obs_cdf}(x))^2]\text{d}x}\), over all thresholds x where x >= obs
- and an under-forecast penalty:
\(\int_{-\infty}^{\infty}{[\text{threshold_weight}(x) \times \text{(fcst}(x) - \text{obs_cdf}(x)^2]\text{d}x}\), over all thresholds x where x <= obs.
To obtain the components of the CRPS score, set
include_componentstoTrue.Note that there are several ways to decompose the CRPS and this decomposition differs from the one used in the
scores.probability.crps_for_ensemble()function.Note that the function crps_cdf is designed so that the obs argument contains actual observed values. crps_cdf will convert obs into CDF form in order to calculate the CRPS.
- To calculate CRPS, integration is applied over the set of thresholds x taken from:
fcst[threshold_dim].values,
obs.values.
threshold_weight[threshold_dim].values if applicable.
additional_thresholds if applicable.
(with NaN values excluded)
- There are two methods of integration:
“exact” gives the exact integral under that assumption that that fcst is continuous and piecewise linear between its specified values, and that threshold_weight (if supplied) is piecewise constant and right-continuous between its specified values.
“trapz” simply uses a trapezoidal rule using the specified values, and so is an approximation of the CRPS. To get an accurate approximation, the density of threshold values can be increased by supplying additional_thresholds.
Both methods of calculating CRPS may require adding additional values to the threshold_dim dimension in fcst and (if supplied) threshold_weight. fcst_fill_method and weight_fill_method specify how fcst and weight are to be filled at these additional points.
The function crps_cdf calculates the CRPS using forecast CDF values ‘as is’. No checks are performed to ensure that CDF values in fcst are nondecreasing along threshold_dim. Checks are conducted on fcst and threshold_weight (if applicable) to ensure that coordinates are increasing along threshold_dim.
- Parameters:
fcst (DataArray) – array of forecast CDFs, with the threshold dimension given by threshold_dim.
obs (DataArray) – array of observations, not in CDF form.
threshold_dim (str) – name of the dimension in fcst that indexes the thresholds.
threshold_weight (DataArray | None) – weight to be applied along threshold_dim to calculate threshold-weighted CRPS. Must contain threshold_dim as a dimension, and may also include other dimensions from fcst. If weight=None, a weight of 1 is applied everywhere, which gives the usual CRPS.
additional_thresholds (Iterable[float] | None) – additional thresholds values to add to fcst and (if applicable) threshold_weight prior to calculating CRPS.
propagate_nans (bool) – If True, propagate NaN values along threshold_dim in fcst and threshold_weight prior to calculating CRPS. This will result in CRPS being NaN for these cases. If False, NaN values in fcst and weight will be replaced, wherever possible, with non-NaN values using the fill method specified by fcst_fill_method and threshold_weight_fill_method.
fcst_fill_method (Literal['linear', 'step', 'forward', 'backward']) –
how to fill values in fcst when NaNs have been introduced (by including additional thresholds) or are specified to be removed (by setting propagate_nans=False). Select one of:
”linear”: use linear interpolation, then replace any leading or trailing NaNs using linear extrapolation. Afterwards, all values are clipped to the closed interval [0, 1].
”step”: apply forward filling, then replace any leading NaNs with 0.
”forward”: first apply forward filling, then remove any leading NaNs by back filling.
”backward”: first apply back filling, then remove any trailing NaNs by forward filling.
(In most cases, “linear” is likely the appropriate choice.)
threshold_weight_fill_method (Literal['linear', 'step', 'forward', 'backward']) – how to fill values in threshold_weight when NaNs have been introduced (by including additional thresholds) or are specified to be removed (by setting propagate_nans=False). Select one of “linear”, “step”, “forward” or “backward”. If the weight function is continuous, “linear” is probably the best choice. If it is an increasing step function, “forward” may be best.
integration_method (str) – one of “exact” or “trapz”.
preserve_dims (Tuple[str]) – dimensions to preserve in the output. All other dimensions are collapsed by taking the mean.
reduce_dims (Tuple[str]) – dimensions to reduce in the output by taking the mean. All other dimensions are preserved.
weights – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.include_components (bool) – if True, include the under and over forecast components of the score in the returned dataset.
- Returns:
- The following are the produced Dataset variables:
”total” the total CRPS.
”underforecast_penalty”: the under-forecast penalty contribution of the CRPS.
”overforecast_penalty”: the over-forecast penalty contribution of the CRPS.
- Return type:
xr.Dataset
- Raises:
ValueError – if threshold_dim is not a dimension of fcst.
ValueError – if threshold_dim is not a dimension of threshold_weight when threshold_weight is not None.
ValueError – if threshold_dim is a dimension of obs.
ValueError – if dimensions of obs are not also dimensions of fcst.
ValueError – if dimensions of threshold_weight are not also dimensions of fcst when threshold_weight is not None.
ValueError – if dims is not a subset of dimensions of fcst.
ValueError – if fcst_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if weight_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if fcst[threshold_dim] has less than 2 values.
ValueError – if coordinates in fcst[threshold_dim] are not increasing.
ValueError – if threshold_weight is not None and coordinates in threshold_weight[threshold_dim] are not increasing.
ValueError – if threshold_weight has negative values.
References
Matheson, J. E., and R. L. Winkler, 1976: Scoring rules for continuous probability distributions. Management Science, 22(10), 1087–1095. https://doi.org/10.1287/mnsc.22.10.1087
Gneiting, T., & Ranjan, R. (2011). Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules. Journal of Business & Economic Statistics, 29(3), 411–422. https://doi.org/10.1198/jbes.2010.08110
- scores.probability.adjust_fcst_for_crps(fcst, threshold_dim, obs, *, decreasing_tolerance=0, additional_thresholds=None, fcst_fill_method='linear', integration_method='exact')#
This function takes a forecast cumulative distribution functions (CDF) fcst.
If fcst is not decreasing outside of specified tolerance, it returns fcst.
- Otherwise, the CDF envelope for fcst is computed, and the CDF from among:
fcst,
the upper envelope, and
the lower envelope
that has the higher (i.e. worse) CRPS is returned. In the event of a tie, preference is given in the order fcst then upper.
See scores.probability.functions.cdf_envelope for details about the CDF envelope.
The use case for this is when, either due to rounding or poor forecast process, the forecast CDF fcst fails to be nondecreasing. Rather than simply replacing fcst with NaN, adjust_fcst_for_crps returns a CDF for which CRPS can be calculated, but possibly with a predictive performance cost as measured by CRPS.
Whether a CDF is decreasing outside specified tolerance is determined as follows. For each CDF in fcst, the sum of incremental decreases along the threshold dimension is calculated. For example, if the CDF values are
[0, 0.4, 0.3, 0.9, 0.88, 1]
then the sum of incremental decreases is -0.12. This CDF decreases outside specified tolerance if 0.12 > decreasing_tolerance.
- The adjusted array of forecast CDFs is determined as follows:
any NaN values in fcst are propagated along threshold_dim so that in each case the entire CDF is NaN;
any CDFs in fcst that are decreasing within specified tolerance are unchanged;
any CDFs in fcst that are decreasing outside specified tolerance are replaced with whichever of the upper or lower CDF envelope gives the highest CRPS, unless the original values give a higher CRPS in which case original values are kept.
See scores.probability.functions.cdf_envelope for a description of the ‘CDF envelope’.
If propagating NaNs is not desired, the user may first fill NaNs in fcst using scores.probability.functions.fill_cdf.
The CRPS for each forecast case is calculated using crps, with a weight of 1.
- Parameters:
fcst (DataArray) – DataArray of CDF values with threshold dimension threshold_dim.
threshold_dim (str) – name of the threshold dimension in fcst.
obs (DataArray) – DataArray of observations.
decreasing_tolerance (float) – nonnegative tolerance value.
additional_thresholds (Iterable[float] | None) – optional additional thresholds passed on to crps when calculating CRPS.
fcst_fill_method (Literal['linear', 'step', 'forward', 'backward']) – fcst fill method passed on to crps when calculating CRPS.
integration_method (Literal['exact', 'trapz']) – integration method passed on to crps when calculating CRPS.
- Returns:
An xarray DataArray of possibly adjusted forecast CDFs, where adjustments are made to penalise CDFs that decrease outside tolerance.
- Raises:
ValueError – If threshold_dim is not a dimension of fcst.
ValueError – If decreasing_tolerance is negative.
- Return type:
DataArray
- scores.probability.crps_step_threshold_weight(step_points, threshold_dim, *, threshold_values=None, steppoints_in_thresholds=True, steppoint_precision=0, weight_upper=True)#
Generates an array of weights based on DataArray step points.
Creates an array of threshold weights, which can be used to calculate threshold-weighted CRPS, based on a step function. Applies a weight of 1 when step_point >= threshold, and a weight of 0 otherwise. Zeros and ones in the output weight function can be reversed by setting weight_upper=False.
- Parameters:
step_points (xr.DataArray) – points at which the weight function changes value from 0 to 1.
threshold_dim (str) – name of the threshold dimension in the returned array weight function.
threshold_values (str) – thresholds at which to calculate weights.
steppoints_in_thresholds (bool) – include step_points among the threshold_dim values.
steppoint_precision (float) – precision at which to round step_points prior to calculating the weight function. Select 0 for no rounding.
weight_upper (bool) – If true, returns a weight of 1 if step_point >= threshold, and a weight of 0 otherwise. If false, returns a weight of 0 if step_point >= threshold, and a weight of 1 otherwise.
- Returns:
Zeros and ones with the dimensions in step_points and an additional threshold_dim dimension.
- Return type:
(xr.DataArray)
- scores.probability.crps_cdf_brier_decomposition(fcst, obs, *, threshold_dim='threshold', additional_thresholds=None, fcst_fill_method='linear', reduce_dims=None, preserve_dims=None)#
Given an array fcst of predictive CDF values indexed along threshold_dim, and an array obs of observations, calculates the mean Brier score for each index along threshold_dim. Since the mean CRPS is the integral of the mean Brier score over all thresholds, this gives a threshold decomposition of the mean CRPS.
If any there are any NaNs along the threshold dimension of fcst, then NaNs are propagated along this dimension prior to calculating the decomposition. If propagating NaNs is not desired, the user may first fill NaNs in fcst using scores.probability.functions.fill_cdf.
- Parameters:
fcst (xr.DataArray) – DataArray of CDF values with threshold dimension threshold_dim.
obs (xr.DataArray) – DataArray of observations, not in CDF form.
threshold_dim (str) – name of the threshold dimension in fcst.
additional_thresholds (Optional[Iterable[float]]) – additional thresholds at which to calculate the mean Brier score.
fcst_fill_method (Literal["linear", "step", "forward", "backward"]) –
How to fill NaN values in fcst that arise from new user-supplied thresholds or thresholds derived from observations.
”linear”: use linear interpolation, and if needed also extrapolate linearly. Clip to 0 and 1. Needs at least two non-NaN values for interpolation, so returns NaNs where this condition fails.
”step”: use forward filling then set remaining leading NaNs to 0. Produces a step function CDF (i.e. piecewise constant).
”forward”: use forward filling then fill any remaining leading NaNs with backward filling.
”backward”: use backward filling then fill any remaining trailing NaNs with forward filling.
dims – dimensions to preserve in the output. The dimension threshold_dim is always preserved, even if not specified here.
reduce_dims (Iterable[str] | None)
preserve_dims (Iterable[str] | None)
- Returns:
“total_penalty”: the mean Brier score for each threshold.
- ”underforecast_penalty”: the mean of the underforecast penalties for the Brier score. For a particular
forecast case, this component equals 0 if the event didn’t occur and the Brier score if it did.
- ”overforecast_penalty”: the mean of the overforecast penalties for the Brier score. For a particular
forecast case, this component equals 0 if the event did occur and the Brier score if it did not.
- Return type:
An xarray Dataset with data_vars
- Raises:
ValueError – if threshold_dim is not a dimension of fcst.
ValueError – if threshold_dim is a dimension of obs.
ValueError – if dimensions of obs are not also among the dimensions of fcst.
ValueError – if dimensions in dims is not among the dimensions of fcst.
ValueError – if fcst_fill_method is not one of ‘linear’, ‘step’, ‘forward’ or ‘backward’.
ValueError – if coordinates in fcst[threshold_dim] are not increasing.
- scores.probability.crps_for_ensemble(fcst, obs, ensemble_member_dim, *, method='ecdf', reduce_dims=None, preserve_dims=None, weights=None, include_components=False)#
Calculates the continuous ranked probability score (CRPS) given an ensemble of forecasts. An ensemble of forecasts can also be thought of as a random sample from the predictive distribution.
Given an observation y, and ensemble member values \(x_i\) (for \(1 \leq i \leq M\)), the CRPS is calculated by the formula
\[CRPS(x_i, x_j, y) = \frac{\sum_{i=1}^{M}(|x_i - y|)}{M} - \frac{\sum_{i=1}^{M}\sum_{j=1}^{M}(|x_i - x_j|)}{2K}\]where the first sum is iterated over \(1 \leq i \leq M\) and the second sum is iterated over \(1 \leq i \leq M\) and \(1 \leq j \leq M\).
- The value of the constant K in this formula depends on the method:
If method=”ecdf” then \(K = M ^ 2\). In this case the CRPS value returned is the exact CRPS value for the empirical cumulative distribution function constructed using the ensemble values.
If method=”fair” then \(K = M(M - 1)\). In this case the CRPS value returned is the approximated CRPS where the ensemble values can be interpreted as a random sample from the underlying predictive distribution. This interpretation stems from the formula \(\text{CRPS}(F, y) = \mathbb{E}(|X - y|) - \frac{1}{2}\mathbb{E}(|X - X'|)\), where X and X’ are independent samples of the predictive distribution F, y is the observation (possibly unknown) and E denotes the expectation. This choice of K gives an unbiased estimate for the second expectation.
When the include_components flag is set to True, the CRPS components are calculated as
\[CRPS(x_i, x_j, y) = O(x_i, y) + U(x_i, y) - S(x_i, x_j)\]- where
\(O(x_i, y) = \frac{\sum_{i=1}^{M} ((x_i - y) \mathbb{1}{\{x_i > y\}})}{M}\) which is the overforecast penalty.
\(U(x_i, y) = \frac{\sum_{i=1}^{M} ((y - x_i) \mathbb{1}{\{x_i < y\}})}{M}\) which is the underforecast penalty.
\(S(x_i, x_j) = \frac{\sum_{i=1}^{M}\sum_{j=1}^{M}(|x_i - x_j|)}{2K}\) which is the forecast spread term.
Note that there are several ways to decompose the CRPS and this decomposition differs from the one used in the
scores_probability.crps_cdf()function.- Parameters:
fcst (DataArray | Dataset) – Forecast data. Must have a dimension
ensemble_member_dim.obs (DataArray | Dataset) – Observation data.
ensemble_member_dim (str) – the dimension that specifies the ensemble member or the sample from the predictive distribution.
method (Literal['ecdf', 'fair']) – Either “ecdf” or “fair”.
reduce_dims (Sequence[str] | None) – Dimensions to reduce. Can be “all” to reduce all dimensions.
preserve_dims (Sequence[str] | None) – Dimensions to preserve. Can be “all” to preserve all dimensions.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.include_components (bool | None) – If True, returns the CRPS with underforecast and overforecast penalties, as well as the forecast spread term (see description above).
- Returns:
xarray object of (weighted mean) CRPS values.
- Raises:
ValueError – when method is not one of “ecdf” or “fair”.
- Return type:
DataArray | Dataset
See also
scores.probability.crps_cdf()scores.probability.tw_crps_for_ensemble()scores.probability.tail_tw_crps_for_ensemble()References
Ferro (2014), “Fair scores for ensemble forecasts”, Quarterly Journal of the Royal Meteorol Society, 140(683):1917-1923. https://doi.org/10.1002/qj.2270
Gneiting T and A. Raftery (2007), “Strictly proper scoring rules, prediction, and estimation”, Journal of the American Statistical Association, 102(477):359-378. https://doi.org/10.1198/016214506000001437
Zamo and P. Naveau (2018), “Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts”, Mathematical Geosciences 50:209-234, https://doi.org/10.1007/s11004-017-9709-7
- scores.probability.tw_crps_for_ensemble(fcst, obs, ensemble_member_dim, chaining_func, *, chaining_func_kwargs=None, method='ecdf', reduce_dims=None, preserve_dims=None, weights=None, include_components=False)#
Calculates the threshold weighted continuous ranked probability score (twCRPS) given ensemble input using a chaining function
chaining_func(see below). An ensemble of forecasts can also be thought of as a random sample from the predictive distribution.The twCRPS is calculated by the formula
\[\text{twCRPS}(F, y) = \mathbb{E}_F \left| v(X) - v(y) \right| - \frac{1}{2} \mathbb{E}_F \left| v(X) - v(X') \right|,\]where \(X\) and \(X'\) are independent samples of the predictive distribution \(F\), \(y\) is the observation, and \(v\) is a ‘chaining function’.
The chaining function \(v\) is the antiderivative of the threshold weight function \(w\), which is a non-negative function that assigns a weight to each threshold value. For example, if we wanted to assign a threshold weight of 1 to thresholds above threshold \(t\) and a threshold weight of 0 to thresholds below \(t\), our threshold weight function would be \(w(x) = \mathbb{1}{(x > t)}\), where \(\mathbb{1}\) is the indicator function which returns a value of 1 if the condition is true and 0 otherwise. A chaining function would then be \(v(x) = \text{max}(x, t)\).
- There are currently two methods available for calculating the twCRPS: “ecdf” and “fair”.
If method=”ecdf” then the twCRPS value returned is the exact twCRPS value for the empirical cumulative distribution function constructed using the ensemble values.
If method=”fair” then the twCRPS value returned is the approximated twCRPS where the ensemble values can be interpreted as a random sample from the underlying predictive distribution. See https://doi.org/10.1002/qj.2270 for more details on the fair CRPS which are relevant for the fair twCRPS.
The ensemble representation of the empirical twCRPS is
\[\text{twCRPS}(F_{\text{ens}}, y; v) = \frac{1}{M} \sum_{m=1}^{M} \left| v(x_m) - v(y) \right| - \frac{1}{2M^2} \sum_{m=1}^{M} \sum_{j=1}^{M} \left| v(x_m) - v(x_j) \right|,\]where \(M\) is the number of ensemble members.
While the ensemble representation of the fair twCRPS is
\[\text{twCRPS}(F_{\text{ens}}, y; v) = \frac{1}{M} \sum_{m=1}^{M} \left| v(x_m) - v(y) \right| - \frac{1}{2M(M - 1)} \sum_{m=1}^{M} \sum_{j=1}^{M} \left| v(x_m) - v(x_j) \right|.\]- Parameters:
fcst (DataArray | Dataset) – Forecast data. Must have a dimension
ensemble_member_dim.obs (DataArray | Dataset) – Observation data.
ensemble_member_dim (str) – the dimension that specifies the ensemble member or the sample from the predictive distribution.
chaining_func (Callable[[DataArray | Dataset], DataArray | Dataset]) – the chaining function.
chaining_func_kwargs (dict[str, Any] | None) – keyword arguments for the chaining function.
method (Literal['ecdf', 'fair']) – Either “ecdf” for the empirical twCRPS or “fair” for the Fair twCRPS.
reduce_dims (Sequence[str] | None) – Dimensions to reduce. Can be “all” to reduce all dimensions.
preserve_dims (Sequence[str] | None) – Dimensions to preserve. Can be “all” to preserve all dimensions.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.include_components (bool | None) – If True, returns the twCRPS with underforecast and overforecast penalties, as well as the forecast spread term. See
scores.probability.crps_for_ensemble()for more details on the decomposition.
- Returns:
xarray object of twCRPS values.
- Raises:
ValueError – when
methodis not one of “ecdf” or “fair”.- Return type:
DataArray
Notes
Chaining functions can be created to vary the weights across given dimensions such as varying the weights by climatological values.
References
Allen, S., Ginsbourger, D., & Ziegel, J. (2023). Evaluating forecasts for high-impact events using transformed kernel scores. SIAM/ASA Journal on Uncertainty Quantification, 11(3), 906-940. https://doi.org/10.1137/22M1532184.
Allen, S. (2024). Weighted scoringRules: Emphasizing Particular Outcomes When Evaluating Probabilistic Forecasts. Journal of Statistical Software, 110(8), 1-26. https://doi.org/10.18637/jss.v110.i08
See also
scores.probability.crps_for_ensemble()scores.probability.tail_tw_crps_for_ensemble()scores.probability.interval_tw_crps_for_ensemble()scores.probability.crps_cdf()Examples
Calculate the twCRPS for an ensemble of forecasts where the chaining function is derived from a weight function that assigns a weight of 1 to thresholds above 0.5 and a weight of 0 to thresholds below 0.5.
>>> import numpy as np >>> import xarray as xr >>> from scores.probability import tw_crps_for_ensemble >>> fcst = xr.DataArray(np.random.rand(10, 10), dims=['time', 'ensemble']) >>> obs = xr.DataArray(np.random.rand(10), dims=['time']) >>> tw_crps_for_ensemble(fcst, obs, 'ensemble', lambda x: np.maximum(x, 0.5))
- scores.probability.tail_tw_crps_for_ensemble(fcst, obs, ensemble_member_dim, threshold, *, tail='upper', method='ecdf', reduce_dims=None, preserve_dims=None, weights=None, include_components=False)#
Calculates the threshold weighted continuous ranked probability score (twCRPS) weighted for a tail of the distribution from ensemble input.
A threshold weight of 1 is assigned for values of the tail and a threshold weight of 0 otherwise. The threshold value of where the tail begins is specified by the
thresholdargument. Thetailargument specifies whether the tail is the upper or lower tail. For example, if we only care about values above 40 degrees C, we can setthreshold=40andtail="upper".For more flexible weighting options and the relevant equations, see the
scores.probability.tw_crps_for_ensemble()function.- Parameters:
fcst (DataArray | Dataset) – Forecast data. Must have a dimension
ensemble_member_dim.obs (DataArray | Dataset) – Observation data.
ensemble_member_dim (str) – the dimension that specifies the ensemble member or the sample from the predictive distribution.
threshold (DataArray | Dataset | float) – the threshold value for where the tail begins. It can either be a float for a single threshold or an xarray object if the threshold varies across dimensions (e.g., climatological values).
tail (Literal['upper', 'lower']) – the tail of the distribution to weight. Either “upper” or “lower”.
method (Literal['ecdf', 'fair']) – Either “ecdf” or “fair”. See
scores.probability.tw_crps_for_ensemble()for more details.reduce_dims (Sequence[str] | None) – Dimensions to reduce. Can be “all” to reduce all dimensions.
preserve_dims (Sequence[str] | None) – Dimensions to preserve. Can be “all” to preserve all dimensions.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.include_components (bool | None) – If True, returns the twCRPS with underforecast and overforecast penalties, as well as the forecast spread term. See
scores.probability.crps_for_ensemble()for more details on the decomposition.
- Returns:
xarray object of twCRPS values that has been weighted based on the tail.
- Raises:
ValueError – when
tailis not one of “upper” or “lower”.ValueError – when
methodis not one of “ecdf” or “fair”.
- Return type:
DataArray | Dataset
References
Allen, S., Ginsbourger, D., & Ziegel, J. (2023). Evaluating forecasts for high-impact events using transformed kernel scores. SIAM/ASA Journal on Uncertainty Quantification, 11(3), 906-940. https://doi.org/10.1137/22M1532184.
Allen, S. (2024). Weighted scoringRules: Emphasizing Particular Outcomes When Evaluating Probabilistic Forecasts. Journal of Statistical Software, 110(8), 1-26. https://doi.org/10.18637/jss.v110.i08
See also
scores.probability.tw_crps_for_ensemble()scores.probability.interval_tw_crps_for_ensemble()scores.probability.crps_for_ensemble()scores.probability.crps_cdf()Examples
Calculate the twCRPS for an ensemble where we assign a threshold weight of 1 to thresholds above 0.5 and a threshold weight of 0 to thresholds below 0.5.
>>> import numpy as np >>> import xarray as xr >>> from scores.probability import tail_tw_crps_for_ensemble >>> fcst = xr.DataArray(np.random.rand(10, 10), dims=['time', 'ensemble']) >>> obs = xr.DataArray(np.random.rand(10), dims=['time']) >>> tail_tw_crps_for_ensemble(fcst, obs, 'ensemble', 0.5, tail='upper')
- scores.probability.interval_tw_crps_for_ensemble(fcst, obs, ensemble_member_dim, lower_threshold, upper_threshold, *, method='ecdf', reduce_dims=None, preserve_dims=None, weights=None, include_components=False)#
Calculates the threshold weighted continuous ranked probability score (twCRPS) for ensemble forecasts where the threshold weight is 1 on a specified interval and 0 otherwise.
The threshold values that define the bounds of the interval are given by the
lower_thresholdandupper_thresholdarguments. For example, if we only want to foucs on the temperatures between -10 and -20 degrees C where aircraft icing is most likely, we can setlower_threshold=-20andupper_threshold=-10.For more flexible weighting options and the relevant equations, see the
scores.probability.tw_crps_for_ensemble()function.- Parameters:
fcst (DataArray | Dataset) – Forecast data. Must have a dimension
ensemble_member_dim.obs (DataArray | Dataset) – Observation data.
ensemble_member_dim (str) – the dimension that specifies the ensemble member or the sample from the predictive distribution.
lower_threshold (DataArray | float) – the threshold value for where the interval begins. It can either be a float for a single threshold or an xarray object if the threshold varies across dimensions (e.g., climatological values).
upper_threshold (DataArray | float) – the threshold value for where the interval ends. It can either be a float for a single threshold or an xarray object if the threshold varies across dimensions (e.g., climatological values).
method (Literal['ecdf', 'fair']) – Either “ecdf” or “fair”. See
scores.probability.tw_crps_for_ensemble()for more details.reduce_dims (Sequence[str] | None) – Dimensions to reduce. Can be “all” to reduce all dimensions.
preserve_dims (Sequence[str] | None) – Dimensions to preserve. Can be “all” to preserve all dimensions.
weights (DataArray | Dataset | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.include_components (bool | None) – If True, returns the twCRPS with underforecast and overforecast penalties, as well as the forecast spread term. See
scores.probability.crps_for_ensemble()for more details on the decomposition.
- Returns:
xarray object of twCRPS values where the threshold weighting is based on an interval.
- Raises:
ValueError – when
lower_thresholdis not less thanupper_threshold.ValueError – when
methodis not one of “ecdf” or “fair”.
- Return type:
DataArray | Dataset
References
Allen, S., Ginsbourger, D., & Ziegel, J. (2023). Evaluating forecasts for high-impact events using transformed kernel scores. SIAM/ASA Journal on Uncertainty Quantification, 11(3), 906-940. https://doi.org/10.1137/22M1532184.
Allen, S. (2024). Weighted scoringRules: Emphasizing Particular Outcomes When Evaluating Probabilistic Forecasts. Journal of Statistical Software, 110(8), 1-26. https://doi.org/10.18637/jss.v110.i08
See also
scores.probability.tw_crps_for_ensemble()scores.probability.tail_tw_crps_for_ensemble()scores.probability.crps_for_ensemble()scores.probability.crps_cdf()Examples
Calculate the twCRPS for an ensemble where we assign a threshold weight of 1 to thresholds between -20 and -10 and a threshold weight of 0 to thresholds outside that interval.
>>> import numpy as np >>> import xarray as xr >>> from scores.probability import interval_tw_crps_for_ensemble >>> fcst = xr.DataArray(np.random.uniform(-40, 20, size=(30, 15)), dims=['time', 'ensemble']) >>> obs = xr.DataArray(np.random.uniform(-40, 20, size=30), dims=['time']) >>> interval_tw_crps_for_ensemble(fcst, obs, 'ensemble', -20, 10)
- scores.probability.murphy_score(fcst, obs, thetas, *, functional, alpha, huber_a=None, decomposition=False, reduce_dims=None, preserve_dims=None)#
Returns the mean elementary score (Ehm et. al. 2016), also known as Murphy score, evaluated at decision thresholds specified by thetas. Optionally returns a decomposition of the score in terms of penalties for over- and under-forecasting.
Select
functional="quantile"andalpha=0.5for the median functional. Selectfunctional="expectile"andalpha=0.5for the mean (i.e., expectation) functional.Consider using
murphy_thetas()to generate thetas. If utilising dask, you may want to storethetasas a netCDF on disk and pass it in as an xarray object. Otherwise, very large objects may be created whenfcst,obsandthetasare broadcast together.- Parameters:
fcst (DataArray) – Forecast numerical values.
obs (DataArray) – Observed numerical values.
thetas (Sequence[float] | DataArray) – Theta thresholds.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
alpha (float) – Risk parameter (i.e. quantile or expectile level) for the functional. Must be between 0 and 1.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
decomposition (bool) – True to return penalty values due to under- and over-fcst as well as the total score, False to return total score only.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Murphy score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Murphy score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the FIRM score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.
- Returns:
An xr.Dataset with dimensions based on the
preserve_dimsorreduce_dimsarg as well as a “theta” dimension with values matchingthetasinput.If
decompositionis False, the dataset’s variables will contain 1 element “total”.If
decompositionis True, in addition to “total”, it will have “underforecast” and “overforecast” data_vars.- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
alphais not strictly between 0 and 1.ValueError – If
huber_ais not strictly greater than 0.
- Return type:
Dataset
References
- For mean elementary score definitions, see:
Theorem 1 of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Theorem 5.3 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.probability.murphy_thetas(forecasts, obs, functional, *, huber_a=None, left_limit_delta=None)#
Return the decision thresholds (theta values) at which to evaluate elementary scores for the construction of Murphy diagrams.
This function may generate a very large number of theta thresholds if the forecast and obs data arrays are large and their values have high precision, which may lead to long computational times for Murphy diagrams. To reduce the number of thetas, users can first round forecast and obs values to an appropriate resolution.
- Parameters:
forecasts (list[DataArray]) – Forecast values, one array per source.
obs (DataArray) – Observed values.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
left_limit_delta (float | None) – Approximation of the left hand limit, used for “huber” and “expectile” functionals. Must be greater than or equal to 0. None will be treated as 0. Ideally, left_limit_delta should be small relative to the fcst and obs precision, and not greater than that precision.
- Returns:
theta thresholds to be used to compute murphy scores.
- Return type:
List[float]
- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
huber_ais not strictly greater than 0.ValueError – If
left_limit_deltais not greater than or equal to 0.
Notes
For theta values at which to evaluate elementary scores, see
Corollary 2 (p.521) of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Corollary 5.6 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.probability.roc_curve_data(fcst, obs, thresholds='auto', *, reduce_dims=None, preserve_dims=None, weights=None, check_args=True)#
Calculates data required for plotting a Receiver (Relative) Operating Characteristic (ROC) curve, including the area under the curve (AUC). The ROC curve is used as a way to measure the discrimination ability of a particular forecast.
The AUC is the probability that the forecast probability of a random event is higher than the forecast probability of a random non-event.
- Parameters:
fcst (DataArray) – An array of probabilistic forecasts for a binary event in the range [0, 1].
obs (DataArray) – An array of binary values where 1 is an event and 0 is a non-event.
thresholds (str | Iterable[float]) – By default, when
thresholds = "auto", the ROC thresholds are automatically generated. Otherwise, you can supply an iterable of floats with monotonic increasing values between 0 and 1, which are the thresholds at and above which to convert the probabilistic forecast to a value of 1 (an ‘event’). If there are many unique forecast values, this can lead to a very large number of automatically generated thresholds. If performance is slow, consider supplying thresholds manually as an iterable of floats. Np.inf is added automatically to the end of the thresholds to ensure that the full ROC curve is produced. Similarly, if 0 is not included, it will be added automatically to ensure the full ROC curve is produced.reduce_dims (Sequence[str] | None) – Optionally specify which dimensions to reduce when calculating the ROC curve data. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Sequence[str] | None) – Optionally specify which dimensions to preserve when calculating ROC curve data. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the values will be the ROC curve at each point (i.e. single-value comparison against observed) for each threshold, and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.check_args (bool) – Checks if
obsdata only contains values in the set {0, 1, np.nan}. You may want to skip this check if you are sure about your input data and want to improve the performance when working with dask.
- Returns:
‘POD’ (the probability of detection)
’POFD’ (the probability of false detection)
’AUC’ (the area under the ROC curve)
PODandPOFDhave dimensionsdims+ ‘threshold’, whileAUChas dimensionsdims.- Return type:
An xarray.Dataset with data variables
- Raises:
ValueError – if
fcstcontains values outside of the range [0, 1].ValueError – if
obscontains non-nan values not in the set {0, 1}.ValueError – if ‘threshold’ is a dimension in
fcst.ValueError – if values in thresholds are not monotonic increasing or are outside the range [0, 1].
ValueError – if
thresholdsis a string that is not “auto”.ValueError – if
thresholdsis an empty iterable.
Warning
If the number of automatically generated thresholds is very large (>1000), a warning is raised suggesting that the user supply thresholds manually as an iterable of floats if performance is slow.
Notes
If
thresholdsis an iterable of floats, the probabilisticfcstis converted to a deterministic forecast for each threshold inthresholds. If a value infcstis greater than or equal to the threshold, then it is converted into a ‘forecast event’ (fcst = 1), and a ‘forecast non-event’ (fcst = 0) otherwise. The probability of detection (POD) and probability of false detection (POFD) are calculated for the converted forecast. From the POD and POFD data, the area under the ROC curve is calculated. An additional threshold ofnp.infis added to the end ofthresholdsso that it always has a value when POD=0 and POFD=0.If
threshold="auto"which is the default, then the thresholds used are the ordered, unique forecast values.Ideally concave ROC curves should be generated rather than traditional ROC curves.
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.probability import roc_curve_data >>> fcst = xr.DataArray(np.random.rand(3, 4), dims=["time", "location"]) >>> obs = xr.DataArray(np.random.randint(0, 2, size=(3, 4)), dims=["time", "location"]) >>> result = roc_curve_data(fcst, obs)
- scores.probability.brier_score(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, check_args=True)#
Calculates the Brier score for forecast and observed data.
\[\text{brier score} = \frac{1}{n} \sum_{i=1}^n (\text{forecast}_i - \text{observed}_i)^2\]If you want to speed up performance with Dask and are confident of your input data, or if you want observations to take values between 0 and 1, set
check_argstoFalse.- Parameters:
fcst (DataArray | Dataset) – Forecast or predicted variables in xarray.
obs (DataArray | Dataset) – Observed variables in xarray.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Brier score. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Brier score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.check_args (bool) – will perform some tests on the data if set to True.
- Raises:
ValueError – if
fcstcontains non-nan values outside of the range [0, 1]ValueError – if
obscontains non-nan values not in the set {0, 1}
- Return type:
DataArray | Dataset
References
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1-3. https://doi.org/fp62r6
- scores.probability.brier_score_for_ensemble(fcst, obs, ensemble_member_dim, event_thresholds, *, reduce_dims=None, preserve_dims=None, weights=None, fair_correction=True, event_threshold_operator=<built-in function ge>, threshold_dim='threshold')#
Calculates the Brier score for an ensemble forecast for the provided thresholds. By default, the fair Brier score is calculated, which is a modified version of the Brier score that applies a correction related to the number of ensemble members and the number of ensemble members that predict the event.
Given an event defined by
event_threshold, the fair Brier score for ensembles is defined as\[s_{i,y} = \left(\frac{i}{m} - y\right)^2 - \frac{i(m-i)}{m^2(m-1)}\]Where:
\(i\) is the number of ensemble members that predict the event
\(m\) is the total number of ensemble members
\(y\) is the binary observation (0 if the event was not observed or 1 otherwise)
When the fair correction is not applied (i.e.,
fair_correction=False), the Brier score is calculated as:\[s_{i,y} = \left(\frac{i}{m} - y\right)^2\]The fair correction will only be applied at coordinates where the number of ensemble members is greater than 1 and
fair_correction=True. When thefair_correctionis set toTrueand the number of ensemble members is 1, the fair correction will not be applied to avoid division by zero. If you want to set the minimum number of ensemble members required to calculate the Brier score, we recommend preprocessing your data to remove data points with fewer than the minimum number of ensemble members that you want.- Parameters:
fcst (DataArray | Dataset) – Real valued ensemble forecasts in xarray. The proportion of ensemble members
function. (above a threshold is calculated within this)
obs (DataArray | Dataset) – Real valued observations in xarray. These values are converted to binary values in this function based on the supplied event thresholds and operators.
ensemble_member_dim (str) – The dimension name of the ensemble member.
event_thresholds (Real | Sequence[Real]) – The threshold(s) that define what is considered an event. If multiple thresholds are provided, they should be monotonically increasing with no NaNs.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Brier score. All other dimensions will be preserved.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the absolute error at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.fair_correction (bool) – Whether or not to apply the fair Brier score correction. Default is True.
event_threshold_operator (Callable) – The mode to use to define an event relative to a supplied threshold. Default is
operator.ge, which means that an event occurs if the observation is greater than or equal to the threshold. Similarly, an event will have been considered to be forecast by an ensemble member if the forecast is greater than or equal to the threshold. Passing inoperator.ltwill give the same results as it is mathematically equivalent in this case. Alternatively, you can provideoperator.gtwhich means that an event occurs if the observation is strictly greater than the threshold. Again, an event for an ensemble member will have been considered to be forecast if the forecast is greater than the threshold. Using``operator.le`` will give the same results asoperator.gtas it is mathematically equivalent in this case.threshold_dim (str) – The name of the threshold dimension in the output. Default is ‘threshold’. It must not exist as a dimension in the forecast or observation data.
- Returns:
The Brier score for the ensemble forecast.
- Raises:
ValueError – if values in
event_thresholdsare not monotonically increasing.ValueError – if
fcstcontains the dimension provided inthreshold_dim.ValueError – if
obscontains the dimension provided inthreshold_dim.ValueError – if
weightscontains the dimension provided inthreshold_dim.ValueError – if
ensemble_member_dimis not infcstdimensions.
- Return type:
DataArray | Dataset
References
- Ferro, C. A. T. (2013). Fair scores for ensemble forecasts. Quarterly
Journal of the Royal Meteorological Society, 140(683), 1917–1923. https://doi.org/10.1002/qj.2270
See also
Examples
Calculate the Brier score for an ensemble forecast for a single threshold:
>>> import numpy as np >>> import xarray as xr >>> from scores.probability import brier_score_for_ensemble >>> fcst = 10 * xr.DataArray(np.random.rand(10, 10), dims=['time', 'ensemble']) >>> obs = 10 * xr.DataArray(np.random.rand(10), dims=['time']) >>> brier_score_for_ensemble(fcst, obs, ensemble_member_dim='ensemble', thresholds=0.5)
Calculate the Brier score for an ensemble forecast for multiple thresholds: >>> thresholds = [0.1, 5, 9] >>> brier_score_for_ensemble(fcst, obs, ensemble_member_dim=’ensemble’, thresholds=thresholds)
- scores.probability.isotonic_fit(fcst, obs, *, weight=None, functional='mean', bootstraps=None, quantile_level=None, solver=None, confidence_level=0.9, min_non_nan=1, report_bootstrap_results=False)#
Calculates an isotonic regression fit to observations given forecasts as the explanatory values. Optionally returns confidence bands on the fit via bootstrapping. Forecasts and observations are scalar-valued (i.e, integers, floats or NaN).
If forecasts target the mean or a quantile functional, then the user can select that functional. Otherwise the user needs to supply a solver function
of one variable (if no weights are supplied), or
of two variables (if weights are supplied).
The solver function is used to find an optimal regression fit in each block, subject to a non-decreasing monotonicity constraint, using the pool adjacent violators (PAV) algorithm.
An example of a solver function of one variable is obs -> np.mean(obs) An example of a solver function of two variables is (obs, weight) -> np.average(obs, weights=weight).
Ties (where one forecast value has more than one observation value) are handled as follows. Forecasts are sorted in ascending order, and observations sorted to match. Where ties occur, observations are sorted in descending order within each subset of ties. Isotonic regression is then performed on this sorted observation sequence.
This implementation uses sklearn.isotonic.IsotonicRegression when functional=”mean”.
- Parameters:
fcst (ndarray | DataArray) – np.array or xr.DataArray of forecast (or explanatory) values. Values must be float, integer or NaN.
obs (ndarray | DataArray) – np.array or xr.DataArray of corresponding observation (or response) values. Must be the same shape as fcst. Values must be float, integer or NaN.
weight (ndarray | DataArray | None) – positive weights to apply to each forecast-observation pair. Must be the same shape as fcst, or None (which is equivalent to applying equal weights).
functional (Literal['mean', 'quantile'] | None) – Functional that the forecasts are targeting. Either “mean” or “quantile” or None. If None then solver must be supplied. The current implementation for “quantile” does not accept weights. If weighted quantile regression is desired then the user should should supply an appropriate solver.
bootstraps (int | None) – the number of bootstrap samples to perform for calculating the regression confidence band. Set to None if a confidence band is not required.
quantile_level (float | None) – the level of the quantile functional if functional=’quantile’. Must be strictly between 0 and 1.
solver (Callable[[ndarray, ndarray], float] | Callable[[ndarray], float] | None) – function that accepts 1D numpy array of observations and returns a float. Function values give the regression fit for each block as determined by the PAV algorithm. If weights are supplied, the function must have a second argument that accepts 1D numpy array of weights of same length as the observations.
confidence_level (float | None) – Confidence level of the confidence band, strictly between 0 and 1. For example, a confidence level of 0.5 will calculate the confidence band by computing the 25th and 75th percentiles of the bootstrapped samples.
min_non_nan (int | None) – The minimum number of non-NaN bootstrapped values to calculate confidence bands at a particular forecast value.
report_bootstrap_results (bool | None) – This specifies to whether keep the bootstrapped regression values in return dictionary. Default is False to keep the output result small.
- Returns:
- “unique_fcst_sorted”: 1D numpy array of remaining forecast values sorted in
ascending order, after any NaNs from fcst, obs and weight are removed, and only unique values are kept to keep the output size reasonably small.
”fcst_counts”: 1D numpy array of forecast counts for unique values of forecast sorted
- ”regression_values”: 1D numpy array of regression values corresponding to
”unique_fcst_sorted” values.
- ”regression_func”: function that returns the regression fit based on linear
interpolation of (“fcst_sorted”, “regression_values”), for any supplied argument (1D numpy array) of potential forecast values.
- ”bootstrap_results”: in the case of report_bootstrap_results=True, 2D numpy
array of bootstrapped regression values is included in return dictionary. Each row gives the interpolated regression values from a particular bootstrapped sample, evaluated at “fcst_sorted” values. If m is the number of bootstrap samples and n = len(fcst_sorted) then it is has shape (m, n). We emphasise that this array corresponds to fcst_sorted not unique_fcst_sorted.
- ”confidence_band_lower_values”: values of lower confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_upper_values”: values of upper confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_lower_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_lower_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_upper_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_upper_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_levels”: tuple giving the quantile levels used to calculate the
confidence band.
- Return type:
Dictionary with the following keys
- Raises:
ValueError – if fcst and obs are np.arrays and don’t have the same shape.
ValueError – if fcst and weight are np.arrays and don’t have the same shape.
ValueError – if fcst and obs are xarray.DataArrays and don’t have the same dimensions.
ValueError – if fcst and weight are xarray.DataArrays and don’t have the same dimensions.
ValueError – if any entries of fcst, obs or weight are not an integer, float or NaN.
ValueError – if there are no fcst and obs pairs after NaNs are removed.
ValueError – if functional is not one of “mean”, “quantile” or None.
ValueError – if quantile_level is not strictly between 0 and 1 when functional=”quantile”.
ValueError – if weight is not None when functional=”quantile”.
ValueError – if functional and solver are both None.
ValueError – if not exactly one of functional and solver is None.
ValueError – if bootstraps is not a positive integer.
ValueError – if confidence_level is not strictly between 0 and 1.
- Note: This function only keeps the unique values of fcst_sorted to keep the volume of
the return dictionary small. The forecast counts is also included, so users can it to create forecast histogram (usually displayed in the reliability diagrams).
- References
de Leeuw, Hornik and Mair. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods”, Journal of Statistical Software, 2009.
Dimitriadis, Gneiting and Jordan. “Stable reliability diagrams for probabilistic classifiers”, PNAS, Vol. 118 No. 8, 2020. Available at https://www.pnas.org/doi/10.1073/pnas.2016191118
Jordan, Mühlemann, and Ziegel. “Optimal solutions to the isotonic regression problem”, 2020 (version 2), available on arxiv at https://arxiv.org/abs/1904.04761
scores.categorical#
- scores.categorical.firm(fcst, obs, risk_parameter, categorical_thresholds, threshold_weights, *, discount_distance=0, reduce_dims=None, preserve_dims=None, weights=None, threshold_assignment='lower', include_components=False)#
Calculates the FIxed Risk Multicategorical (FIRM) score including the underforecast and overforecast penalties.
categorical_thresholds and threshold_weights must be the same length.
- Parameters:
fcst (DataArray | Dataset) – An array of real-valued forecasts that we want to treat categorically.
obs (DataArray | Dataset) – An array of real-valued observations that we want to treat categorically.
risk_parameter (float) – Risk parameter (alpha) for the FIRM score. The value must satisfy 0 < risk_parameter < 1.
categorical_thresholds (Sequence[float] | Sequence[DataArray]) – Category thresholds (thetas) to delineate the categories. A sequence of xr.DataArrays may be supplied to allow for different thresholds at each coordinate (e.g., thresholds determined by climatology).
threshold_weights (Sequence[float | DataArray]) – Weights that specify the relative importance of forecasting on the correct side of each category threshold. Either a positive float can be supplied for each categorical threshold or an xr.DataArray (with no negative values) can be provided for each categorical threshold as long as its dims are a subset of obs dims. NaN values are allowed in the xr.DataArray. For each NaN value at a given coordinate, the FIRM score will be NaN at that coordinate, before dims are collapsed.
discount_distance (float | None) – An optional discounting distance parameter which satisfies discount_distance >= 0 such that the cost of misses and false alarms are discounted whenever the observation is within distance discount_distance of the forecast category. A value of 0 will not apply any discounting.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the FIRM score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating FIRM. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the FIRM score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.threshold_assignment (str | None) – Specifies whether the intervals defining the categories are left or right closed. That is whether the decision threshold is included in the upper (left closed) or lower (right closed) category. Defaults to “lower”.
include_components (bool) – If True, then the function returns the FIRM score along with the overforecast and underforecast penalties as a new dimension called “components”. If False (default), then only the FIRM score is returned.
- Returns:
An xarray object with the FIRM score. If include_components is true, then a new dimension called “components” is created that contains the following coordinates:
firm_score: A score for a single category for each coord based on the FIRM framework.
overforecast_penalty: Penalty for False Alarms.
underforecast_penalty: Penalty for Misses.
- Raises:
ValueError – if len(categorical_thresholds) < 1.
ValueError – if categorical_thresholds and threshold_weights lengths are not equal.
ValueError – if risk_parameter <= 0 or >= 1.
ValueError – if any values in threshold_weights are <= 0.
ValueError – if discount_distance is not None and < 0.
scores.utils.DimensionError – if threshold_weights is a list of xr.DataArrays and if the dimensions of these xr.DataArrays is not a subset of the obs dims.
ValueError – if threshold_assignment is not “upper” or “lower”.
ValueError – if the dimension name “components” is used in fcst or obs.
TypeError – If fcst and obs are not both xarray DataArrays or Datasets.
- Return type:
DataArray | Dataset
Note
Setting discount distance to None or 0, will mean that no discounting is applied. This means that errors will be penalised strictly categorically.
Setting discount distance to np.inf means that the cost of a miss is always proportional to the distance of the observation from the threshold, and similarly for false alarms.
Example
>>> import numpy as np >>> import xarray as xr >>> from scores.categorical import firm >>> fcst = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) >>> obs = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) >>> categorical_thresholds = [0.2, 0.5, 0.8] >>> risk_parameter = 0.1 >>> firm_score = firm(fcst, obs, risk_parameter, categorical_thresholds)
References
Taggart, R., Loveday, N. and Griffiths, D., 2022. A scoring framework for tiered warnings and multicategorical forecasts based on fixed risk measures. Quarterly Journal of the Royal Meteorological Society, 148(744), pp.1389-1406.
- scores.categorical.probability_of_detection(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, check_args=True)#
Calculates the Probability of Detection (POD), also known as the Hit Rate. This is the proportion of observed events (obs = 1) that were correctly forecast as an event (fcst = 1).
- Parameters:
fcst (DataArray | Dataset) – An array containing binary values in the set {0, 1, np.nan}
obs (DataArray | Dataset) – An array containing binary values in the set {0, 1, np.nan}
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to sum when calculating the POD. All other dimensions will be not summed. As a special case, ‘all’ will allow all dimensions to be summed. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to sum across all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to not sum when calculating the POD. All other dimensions will be summed. As a special case, ‘all’ will allow all dimensions to be not summed. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the POD score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.check_args (bool | None) – Checks if fcst and `obs data only contains values in the set {0, 1, np.nan}. You may want to skip this check if you are sure about your input data and want to improve the performance when working with dask.
- Returns:
A DataArray of the Probability of Detection.
- Raises:
ValueError – if there are values in fcst and obs that are not in the set {0, 1, np.nan} and check_args is true.
- Return type:
DataArray | Dataset
- scores.categorical.probability_of_false_detection(fcst, obs, *, reduce_dims=None, preserve_dims=None, weights=None, check_args=True)#
Calculates the Probability of False Detection (POFD), also known as False Alarm Rate (not to be confused with the False Alarm Ratio). The POFD is the proportion of observed non-events (obs = 0) that were incorrectly forecast as event (i.e. fcst = 1).
- Parameters:
fcst (DataArray | Dataset) – An array containing binary values in the set {0, 1, np.nan}
obs (DataArray | Dataset) – An array containing binary values in the set {0, 1, np.nan}
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to sum when calculating the POFD. All other dimensions will be not summed. As a special case, ‘all’ will allow all dimensions to be summed. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to sum across all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to not sum when calculating the POFD. All other dimensions will be summed. As a special case, ‘all’ will allow all dimensions to be not summed. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the POD score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.check_args (bool | None) – Checks if fcst and `obs data only contains values in the set {0, 1, np.nan}. You may want to skip this check if you are sure about your input data and want to improve the performance when working with dask.
- Returns:
A DataArray of the Probability of False Detection.
- Raises:
ValueError – if there are values in fcst and obs that are not in the set {0, 1, np.nan} and check_args is true.
- Return type:
DataArray | Dataset
- class scores.categorical.BinaryContingencyManager(fcst_events, obs_events)#
See https://scores.readthedocs.io/en/stable/tutorials/Binary_Contingency_Scores.html for a detailed walkthrough showing the use in practice.
A BinaryContingencyManager holds the underlying binary forecast and observed event data, from which it builds a contingency table and provides scoring functionality based on that table.
A BinaryContingencyManager is typically created by an
EventOperator, such as aThresholdOperator, but can also be created directly from binary data if the user wishes.- Supported operations include:
“Transforming” the data in various ways, such as dimensional reduction
Producing contingency tables
Calculating scores and metrics based on contingency tables
- The full data comprises several n-dimensional binary arrays which can be considered maps of:
True positives (hits)
False positives (false alarms)
True negatives (correct negatives)
False negatives (misses)
These masks, in addition to supporting score calculations, can be accessed and used for:
Plotting these attributes on a map
Masking these values by a geographical region prior to score calculation
As such, the per-pixel information is useful as well as the overall ratios involved.
BinaryContingencyManager inherits from (uses) the
BasicContingencyManagerclass to provide score calculations on the final contingency table. Documentation for the available scores is found in theBasicContingencyManagerAPI entry but the calls can be made directly against instances of BinaryContingencyManager where performance or transformation are not a concern.- Parameters:
fcst_events (DataArray | Dataset | Series)
obs_events (DataArray | Dataset | Series)
- transform(*, reduce_dims=None, preserve_dims=None)#
Compute the contingency table, preserving or reducing the specified dimensions.
- Parameters:
reduce_dims (-) – Dimensions to reduce. Can be “all” to reduce all dimensions.
preserve_dims (-) – Dimensions to preserve. Can be “all” to preserve all dimensions.
- Returns:
A scores class which supports efficient calculation of contingency metrics.
- Return type:
- class scores.categorical.BasicContingencyManager(counts)#
See https://scores.readthedocs.io/en/stable/tutorials/Binary_Contingency_Scores.html for a detailed walkthrough showing the use of this class in practice.
A BasicContingencyManager object provides the scoring functions which are calculated from a contingency table. A BasicContingencyManager can be efficiently and repeatedly queried for a wide variety of scores.
A BasicContingencyManager is produced when a
BinaryContingencyManageris transformed. It is also possible to create a BasicContingencyManager from event counts or a contingency table, although this not a common user requirement.A contingency table is built only from event counts, losing the connection to the actual event tables in their full dimensionality. The event count data is much smaller than the full event tables, particularly when considering very large data sets like Numerical Weather Prediction (NWP) data, which could be terabytes to petabytes in size.
By contrast, A
BinaryContingencyManagerretains the full event data, which provides some more flexbility but may reduce efficiency.- Parameters:
counts (dict)
- accuracy()#
Identical to
fraction_correct().Accuracy calculates the proportion of forecasts which are correct.
- Returns:
A DataArray containing the accuracy score
- Return type:
xr.DataArray
\[\text{accuracy} = \frac{\text{true positives} + \text{true negatives}}{\text{total count}}\]Notes
Range: 0 to 1, where 1 indicates a perfect score.
“True positives” is the same at “hits”.
“False negatives” is the same as “misses”.
References
- base_rate()#
The observed event frequency.
- Returns:
An xarray object containing the base rate.
- Return type:
xr.DataArray
\[\text{base rate} = \frac{\text{true positives} + \text{false negatives}}{\text{total count}}\]Notes
Range: 0 to 1, where 1 indicates the event occurred every time.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
References
Hogan, R. J. & Mason, I. B. (2011). Deterministic forecasts of binary events. In I. T. Jolliffe & D. B. Stephenson (Eds.), Forecast verification: A practitioner’s guide in atmospheric science (2nd ed., pp. 39-51). https://doi.org/10.1002/9781119960003.ch3
- bias_score()#
Identical to
frequency_bias().How did the forecast frequency of “yes” events compare to the observed frequency of “yes” events?
- Returns:
An xarray object containing the bias score
- Return type:
xr.DataArray
\[\text{frequency bias} = \frac{\text{true positives} + \text{false positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to ∞ (infinity), where 1 indicates a perfect score.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
“False negatives” is the same as “misses”.
References
- cohens_kappa()#
Identical to
heidke_skill_score().Calculates Cohen’s kappa.
What was the accuracy of the forecast relative to that of random chance?
- Returns:
An xarray object containing the Cohen’s Kappa score
- Return type:
xr.DataArray
\[\begin{split}\\ \text{Cohen's Kappa} \left(\kappa\right) = \frac{\text{true positives} + \text{true negatives} - E_{random}}{ \text{total count} - E_{\text{random}}}\end{split}\]where
\[\begin{split}\begin{aligned} E_{\text{random}} &= \text{expected correct matches due to random chance} \\&= \frac{\left(\text{tp} + \text{fn}\right) \cdot \left(\text{tp} + \text{fp}\right) + \left(\text{tn} + \text{fn}\right) \cdot \left(\text{tn} + \text{fp}\right)}{ \text{total count}} \end{aligned}\end{split}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” (\(\text{tp}\)) is the same as “hits”.
“False negatives” (\(\text{fn}\)) is the same as “misses”.
“False positives” (\(\text{fp}\)) is the same as “false alarms”.
“True negatives” (\(\text{tn}\)) is the same as “correct negatives”.
References
- critical_success_index()#
Identical to
threat_score().- Returns:
An xarray object containing the critical success index
- Return type:
xr.DataArray
\[\text{threat score} = \frac{\text{true positives}}{\text{true positives} + \text{false positives} + \text{false negatives}}\]Notes
Range: 0 to 1, 0 indicates no skill. Perfect score: 1.
Often known as CSI.
“True positives” is the same as “hits”
“False positives” is the same as “false alarms”
“True negatives” is the same as “correct negatives”
References
- equitable_threat_score()#
Identical to
gilberts_skill_score().Calculates the Equitable threat score.
How well did the forecast “yes” events correspond to the observed “yes” events (accounting for hits due to chance)?
- Returns:
An xarray object containing the equitable threat score
- Return type:
xr.DataArray
\[\text{ETS} = \frac{\text{true positives} - \text{true positives} _\text{random}} {\text{true positives} + \text{false negatives} + \text{false positives} - \text{true positives} _\text{random}}\]where
\[\text{true_positives}_{\text{random}} = \frac{(\text{true positives} + \text{false negatives}) (\text{true positives} + \text{false positives})}{\text{total count}}\]Notes
Range: -1/3 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”
References
Gilbert, G.K., 1884. Finley’s tornado predictions. American Meteorological Journal, 1(5), pp.166–172.
Hogan, R.J., Ferro, C.A., Jolliffe, I.T. and Stephenson, D.B., 2010. Equitability revisited: Why the “equitable threat score” is not equitable. Weather and Forecasting, 25(2), pp.710-726. https://doi.org/10.1175/2009WAF2222350.1
- f1_score()#
Calculates the F1 score.
- Returns:
An xarray object containing the F1 score
- Return type:
xr.DataArray
\[\text{F1} = \frac{2 \cdot \text{true positives}}{(2 \cdot \text{true positives}) + \text{false positives} + \text{false negatives}}\]Notes
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
“False negatives” is the same as “misses”.
References
- false_alarm_rate()#
Identical to
probability_of_false_detection.What fraction of the non-events were incorrectly predicted?
- Returns:
An xarray object containing the false alarm rate
- Return type:
xr.DataArray
\[\text{false alarm rate} = \frac{\text{false positives}}{\text{true negatives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 0.
Not to be confused with the false alarm ratio.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
- false_alarm_ratio()#
The false alarm ratio (FAR) calculates the fraction of the predicted “yes” events which did not eventuate (i.e., were false alarms).
- Returns:
An xarray object containing the false alarm ratio
- Return type:
xr.DataArray
\[\text{false alarm ratio} = \frac{\text{false positives}}{\text{true positives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 0.
Not to be confused with the False Alarm Rate.
“False positives” is the same as “false alarms”.
“True positives” is the same as “hits”.
References
- forecast_rate()#
The forecast event frequency.
- Returns:
An xarray object containing the forecast rate.
- Return type:
xr.DataArray
\[\text{forecast rate} = \frac{\text{true positives} + \text{false positives}}{\text{total count}}\]Notes
Range: 0 to 1, where 1 indicates the event was forecast every time.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
References
Hogan, R. J. & Mason, I. B. (2011). Deterministic forecasts of binary events. In I. T. Jolliffe & D. B. Stephenson (Eds.), Forecast verification: A practitioner’s guide in atmospheric science (2nd ed., pp. 39-51). https://doi.org/10.1002/9781119960003.ch3
- format_table(positive_value_name='Positive', negative_value_name='Negative')#
Formats a contingency table from a 2x2 contingency manager into a confusion matrix.
- Parameters:
positive_value_name (str) – A string with the value to be displayed on the contingency table visual. The default value is ‘Positive’.
negative_value_name (str) – A string with the value to be displayed on the contingency table visual. The default value is ‘Negative’.
- Returns:
A Pandas DataFrame representing 2x2 binary contingency table if possible. xr.DataArray: An xarray DataArray showing higher-dimension contingency table information otherwise
- Return type:
pandas.DataFrame
This function retrieves the contingency table, extracts the relevant counts, and constructs a 2x2 confusion matrix represented as a Pandas DataFrame. This function does not support HTML rendering of tables where dimensionality has been preserved.
Example
Assuming self.xr_table returns an xr.Array:
array([ 5., 11., 1., 1., 18.]) Coordinates: contingency (contingency) <U11 'tp_count' ... 'total_count'
This function will return:
Positive Negative Total Positive 5 1 6 Negative 1 11 12 Total 6 12 18
- fraction_correct()#
Identical to
accuracy().Fraction correct calculates the proportion of forecasts which are correct.
- Returns:
An xarray object containing the fraction correct.
- Return type:
xr.DataArray
\[\text{fraction correct} = \frac{\text{true positives} + \text{true negatives}}{\text{total count}}\]Notes
Range: 0 to 1, where 1 indicates a perfect score.
“True positives” is the same as “hits”.
“True negatives” is the same as “correct negatives”.
References
- frequency_bias()#
Identical to
bias_score().How did the forecast frequency of “yes” events compare to the observed frequency of “yes” events?
- Returns:
An xarray object containing the frequency bias
- Return type:
xr.DataAray
\[\text{frequency bias} = \frac{\text{true positives} + \text{false positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to ∞ (infinity), where 1 indicates a perfect score.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
“False negatives” is the same as “misses”.
References
- get_counts()#
- Returns:
A dictionary of the contingency table counts. Values are xr.DataArray instances. The keys are:
tp_count for true positive count
tn_count for true negative count,
fp_count for false positive count
fn_count for false negative count
total_count for the total number of events.
- Return type:
dict
Here is an example of what may be returned:
{'tp_count': <xarray.DataArray ()> Size: 8B array(5.), 'tn_count': <xarray.DataArray ()> Size: 8B array(11.), 'fp_count': <xarray.DataArray ()> Size: 8B array(1.), 'fn_count': <xarray.DataArray ()> Size: 8B array(1.), 'total_count': <xarray.DataArray ()> Size: 8B array(18.)}
- get_table()#
- Returns:
The contingency table as an xarray object. Contains a coordinate dimension called ‘contingency’. Valid coordinates for this dimenstion are:
tp_count for true positive count
tn_count for true negative count
fp_count for false positive count
fn_count for false negative count
total_count for the total number of events.
- Return type:
xr.DataArray
Here is an example of what may be returned:
array([ 5., 11., 1., 1., 18.]) Coordinates: contingency (contingency) <U11 'tp_count' ... 'total_count' Indexes: contingency PandasIndex Attributes: (0)
- gilberts_skill_score()#
Identical to
equitable_threat_score().Calculates the Gilbert skill score.
How well did the forecast “yes” events correspond to the observed “yes” events (accounting for hits due to chance)?
- Returns:
An xarray object containing the Gilberts Skill Score
- Return type:
xr.DataArray
\[\text{GSS} = \frac{\text{true positives} - \text{true positives} _\text{random}} {\text{true positives} + \text{false negatives} + \text{false positives} - \text{true positivies} _\text{random}}\]where
\[\text{true_positives}_{\text{random}} = \frac{(\text{true positives} + \text{false negatives}) (\text{true positives} + \text{false positives})}{\text{total count}}\]Notes
Range: -1/3 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”
“False positives” is the same as “false alarms”.
References
Gilbert, G.K., 1884. Finley’s tornado predictions. American Meteorological Journal, 1(5), pp.166–172.
Hogan, R.J., Ferro, C.A., Jolliffe, I.T. and Stephenson, D.B., 2010. Equitability revisited: Why the “equitable threat score” is not equitable. Weather and Forecasting, 25(2), pp.710-726. https://doi.org/10.1175/2009WAF2222350.1
- hanssen_and_kuipers_discriminant()#
Identical to
peirce_skill_score()andtrue_skill_statistic().How well did the forecast separate the “yes” events from the “no” events?
- Returns:
An xarray object containing Hanssen and Kuipers’ Discriminant
- Return type:
xr.DataArray
\[\text{HK} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} - \frac{\text{false positives}}{\text{false positives} + \text{true negatives}}\]where \(\text{HK}\) is Hansen and Kuipers Discriminant
Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
- heidke_skill_score()#
Identical to
cohens_kappa().Calculates the Heidke skill score.
What was the accuracy of the forecast relative to that of random chance?
- Returns:
An xarray object containing the Heidke skill score
- Return type:
xr.DataArray
\[\begin{split}\\ \text{HSS} = \frac{\text{true positives} + \text{true negatives} - E_{random}}{ \text{total count} - E_{\text{random}}}\end{split}\]where
\[\begin{split}\begin{aligned} E_{\text{random}} &= \text{expected correct matches due to random chance} \\&= \frac{\left(\text{tp} + \text{fn}\right) \cdot \left(\text{tp} + \text{fp}\right) + \left(\text{tn} + \text{fn}\right) \cdot \left(\text{tn} + \text{fp}\right)}{ \text{total count}} \end{aligned}\end{split}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
HSS = Heidke Skill Score
“True positives” (\(\text{tp}\)) is the same as “hits”.
“False negatives” (\(\text{fn}\)) is the same as “misses”.
“False positives” (\(\text{fp}\)) is the same as “false alarms”.
“True negatives” (\(\text{tn}\)) is the same as “correct negatives”.
References
- hit_rate()#
Identical to
true_positive_rate(),probability_of_detection,sensitivity()andrecall().Calculates the proportion of the observed events that were correctly forecast.
- Returns:
An xarray object containing the hit rate
- Return type:
xr.DataArray
\[\text{true positives} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
References
- negative_predictive_value()#
Calculates the negative predictive value (NPV).
Of all the times a negative result was predicted, how often was the prediction actually correct?
- Returns:
An xarray object containing the negative predictive value score
- Return type:
xr.DataArray
\[\text{negative predictive value} = \frac{\text{true negatives}}{\text{true negatives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True negatives” is the same as “correct rejections”.
“False negatives” is the same as “missed detections”.
References
https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values
Monaghan, T. F., Rahman, S. N., Agudelo, C. W., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Foundational statistical principles in medical research: Sensitivity, specificity, positive predictive value, and negative predictive value. Medicina, 57(5), 503. https://doi.org/10.3390/medicina57050503
- odds_ratio()#
Calculates the odds ratio
What is the ratio of the odds of a “yes” forecast being correct, to the odds of a “yes” forecast being wrong?
- Returns:
An xarray object containing the odds ratio
- Return type:
xr.DataArray
\[\begin{split}\begin{aligned} \text{odds ratio} &= \left[\frac{\text{POD}}{\text{1 - POD}}\right] \div \left[\frac{\text{POFD}}{\text{1 - POFD}}\right] \\ &= \frac{\text{true positives} \cdot \text{true negatives}}{\text{false positives} \cdot \text{false negatives}} \end{aligned}\end{split}\]where
\[\text{POD} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]and
\[\text{POFD} = \frac{\text{false positives}}{\text{true negatives} + \text{false positives}}\]Notes
Range: 0 to ∞, 1 indicates no skill. Perfect score: ∞.
POD = Probability of Detection
POFD = Probability of False Detection
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
Stephenson, D.B., 2000. Use of the “odds ratio” for diagnosing forecast skill. Weather and Forecasting, 15(2), pp.221-232. https://doi.org/10.1175/1520-0434(2000)015%3C0221:UOTORF%3E2.0.CO;2
- odds_ratio_skill_score()#
Identical to
yules_q().Calculates the odds ratio skill score.
What was the improvement of the forecast over random chance?
- Returns:
An xarray object containing the odds ratio skill score
- Return type:
xr.DataArray
\[\begin{split}\begin{aligned} \text{ORSS} &= \frac{\text{OR} - 1}{\text{OR} + 1} \\ &= \frac{\text{true positives} \cdot \text{true negatives} - \text{false positives} \cdot \text{false negatives}}{ \text{true positives} \cdot \text{true negatives} + \text{false positives} \cdot \text{false negatives}} \end{aligned}\end{split}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
ORSS = Odds Ratio Skill Score
OR = Odds ratio, see:
odds_ratio()“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
Stephenson, D.B., 2000. Use of the “odds ratio” for diagnosing forecast skill. Weather and Forecasting, 15(2), pp.221-232. https://doi.org/10.1175/1520-0434(2000)015%3C0221:UOTORF%3E2.0.CO;2
- peirce_skill_score()#
Identical to
hanssen_and_kuipers_discriminant()andtrue_skill_statistic().How well did the forecast separate the “yes” events from the “no” events?
- Returns:
An xarray object containing the Peirce Skill Score
- Return type:
xr.DataArray
\[\text{Peirce skill score} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} - \frac{\text{false positives}}{\text{false positives} + \text{true negatives}}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
Peirce, C.S., 1884. The numerical measure of the success of predictions. Science, ns-4(93), pp.453-454. https://doi.org/10.1126/science.ns-4.93.453.b
- positive_predictive_value()#
Identical to
success_ratio()andprecision().What proportion of the forecast events actually eventuated?
- Returns:
An xarray object containing the positive predictive value score
- Return type:
xr.DataArray
\[\text{positive predictive value} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
References
https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values
Monaghan, T. F., Rahman, S. N., Agudelo, C. W., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Foundational statistical principles in medical research: Sensitivity, specificity, positive predictive value, and negative predictive value. Medicina, 57(5), 503. https://doi.org/10.3390/medicina57050503
- precision()#
Identical to
success_ratio()andpositive_predictive_value().What proportion of the forecast events actually eventuated?
- Returns:
An xarray object containing the precision score
- Return type:
xr.DataArray
\[\text{precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
References
- probability_of_detection()#
Probability of detection (POD) is identical to
hit_rate(),true_positive_rate(),sensitivity()andrecall().Calculates the proportion of the observed events that were correctly forecast.
- Returns:
An xarray object containing the probability of detection
- Return type:
xr.DataArray
\[\text{hit rate} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”
“False negatives” is the same as “misses”
References
- probability_of_false_detection()#
Identical to
false_alarm_rate().What fraction of the non-events were incorrectly predicted?
- Returns:
An xarray object containing the probability of false detection
- Return type:
xr.DataArray
\[\text{probability of false detection} = \frac{\text{false positives}}{\text{true negatives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 0.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
- recall()#
Identical to
hit_rate(),probability_of_detection,true_positive_rate(), andsensitivity().Calculates the proportion of the observed events that were correctly forecast.
- Returns:
An xarray object containing the probability of detection
- Return type:
xr.DataArray
\[\text{recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
References
- sensitivity()#
Identical to
hit_rate(),probability_of_detection,true_positive_rate(), andrecall().Calculates the proportion of the observed events that were correctly forecast.
- Returns:
An xarray object containing the probability of detection
- Return type:
xr.DataArray
\[\text{sensitivity} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
References
Monaghan, T. F., Rahman, S. N., Agudelo, C. W., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Foundational statistical principles in medical research: Sensitivity, specificity, positive predictive value, and negative predictive value. Medicina, 57(5), 503. https://doi.org/10.3390/medicina57050503
- specificity()#
Identical to
true_negative_rate().The probability that an observed non-event will be correctly predicted.
- Returns:
An xarray object containing the true negative rate (specificity).
- Return type:
xr.DataArray
\[\text{specificity} = \frac{\text{true negatives}}{\text{true negatives} + \text{false positives}}\]Notes
“True negatives” is the same as “correct negatives”.
“False positives” is the same as “false alarms”.
- Reference:
Monaghan, T. F., Rahman, S. N., Agudelo, C. W., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Foundational statistical principles in medical research: Sensitivity, specificity, positive predictive value, and negative predictive value. Medicina, 57(5), 503. https://doi.org/10.3390/medicina57050503
- success_ratio()#
Identical to
precision()andpositive_predictive_value().What proportion of the forecast events actually eventuated?
- Returns:
An xarray object containing the success ratio
- Return type:
xr.DataArray
\[\text{success ratio} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
References
- symmetric_extremal_dependence_index()#
Calculates the Symmetric Extremal Dependence Index (SEDI).
- Returns:
An xarray object containing the SEDI score
- Return type:
xr.DataArray
\[\frac{\ln(\text{POFD}) - \ln(\text{POD}) + \ln(\text{1-POD}) - \ln(\text{1 -POFD})} {\ln(\text{POFD}) + \ln(\text{POD}) + \ln(\text{1-POD}) + \ln(\text{1 -POFD})}\]where
\[\text{POD} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]and
\[\text{POFD} = \frac{\text{false positives}}{\text{true negatives} + \text{false positives}}\]Notes
POD = Probability of Detection
POFD = Probability of False Detection
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
Range: -1 to 1, Perfect score: 1.
References
Ferro, C.A.T. and Stephenson, D.B., 2011. Extremal dependence indices: Improved verification measures for deterministic forecasts of rare binary events. Weather and Forecasting, 26(5), pp.699-713. https://doi.org/10.1175/WAF-D-10-05030.1
- threat_score()#
Identical to
critical_success_index().- Returns:
An xarray object containing the threat score
- Return type:
xr.DataArray
\[\text{threat score} = \frac{\text{true positives}}{\text{true positives} + \text{false positives} + \text{false negatives}}\]Notes
Range: 0 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
- true_negative_rate()#
Identical to
specificity().The probability that an observed non-event will be correctly predicted.
- Returns:
An xarray object containing the true negative rate.
- Return type:
xr.DataArray
\[\text{true negative rate} = \frac{\text{true negatives}}{\text{true negatives} + \text{false positives}}\]Notes
“True negatives” is the same as “correct negatives”.
“False positives” is the same as “false alarms”.
- true_positive_rate()#
Identical to
hit_rate(),probability_of_detection,sensitivity()andrecall().The proportion of the observed events that were correctly forecast.
- Returns:
An xarray object containing the true positive rate
- Return type:
xr.DataArray
\[\text{true positive rate} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}\]Notes
Range: 0 to 1. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
References
- true_skill_statistic()#
Identical to
peirce_skill_score()andhanssen_and_kuipers_discriminant().How well did the forecast separate the “yes” events from the “no” events?
- Returns:
An xarray object containing the true skill statistic
- Return type:
xr.DataArray
\[\text{true skill statistic} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}} - \frac{\text{false positives}}{\text{false positives} + \text{true negatives}}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
- yules_q()#
Identical to
odds_ratio_skill_score().Calculates the Yule’s Q.
What was the improvement of the forecast over random chance?
- Returns:
An xarray object containing Yule’s Q
- Return type:
xr.DataArray
\[\begin{split}\begin{aligned} \text{Yule's Q} &= \frac{\text{OR} - 1}{\text{OR} + 1} \\ &= \frac{\text{true positives} \cdot \text{true negatives} - \text{false positives} \cdot \text{false negatives}}{ \text{true positives} \cdot \text{true negatives} + \text{false positives} \cdot \text{false negatives}} \end{aligned}\end{split}\]Notes
Range: -1 to 1, 0 indicates no skill. Perfect score: 1.
OR = Odds ratio, see:
odds_ratio().“True positives” is the same as “hits”.
“False negatives” is the same as “misses”.
“False positives” is the same as “false alarms”.
“True negatives” is the same as “correct negatives”.
References
Stephenson, D.B., 2000. Use of the “odds ratio” for diagnosing forecast skill. Weather and Forecasting, 15(2), pp.221-232. https://doi.org/10.1175/1520-0434(2000)015%3C0221:UOTORF%3E2.0.CO;2
- class scores.categorical.ThresholdEventOperator(*, precision=8, default_event_threshold=0.001, default_op_fn=<built-in function ge>)#
See https://scores.readthedocs.io/en/stable/tutorials/Binary_Contingency_Scores.html for a detailed walkthrough showing the use in practice.
A ThresholdEventOperator is used to produce a
BinaryContingencyManagerfrom forecast and observed data. It considers an event to be defined when the forecast and observed variables meet a particular threshold condition (e.g. rainfall above 1mm).The class may be used for any variable, for any threshold, and for any comparison operator (e.g. greater-than, less-than, greater-than-or-equal-to, … )
- make_contingency_manager(fcst, obs, *, event_threshold=None, op_fn=None)#
Using this function requires a careful understanding of the structure of the data and the use of the operator function. The default operator is a simple greater-than operator, so this will work on a simple DataArray. To work on a DataSet, a richer understanding is required. It is recommended to work through the tutorial at https://scores.readthedocs.io/en/stable/tutorials/Binary_Contingency_Scores.html . This tutorial reviews more complex use cases, including multivariate gridded model data, and station data structures.
- Parameters:
fcst (DataArray | Dataset | Series)
obs (DataArray | Dataset | Series)
- Return type:
- make_event_tables(fcst, obs, *, event_threshold=None, op_fn=None)#
Using this function requires a careful understanding of the structure of the data and the use of the operator function. The default operator is a simple greater-than operator, so this will work on a simple DataArray. To work on a DataSet, a richer understanding is required. It is recommended to work through the tutorial at https://scores.readthedocs.io/en/stable/tutorials/Binary_Contingency_Scores.html . This tutorial reviews more complex use cases, including multivariate gridded model data, and station data structures.
- Parameters:
fcst (DataArray | Dataset | Series)
obs (DataArray | Dataset | Series)
- class scores.categorical.EventOperator#
Abstract Base Class (ABC) for event operators which can be used in deriving contingency tables. ABCs are not used directly but instead define the requirements for other classes and may be used for type checking.
- abstractmethod make_contingency_manager(fcst, obs, *, event_threshold=None, op_fn=None)#
This method should be over-ridden to return a contingency table.
- Parameters:
fcst (DataArray | Dataset | Series)
obs (DataArray | Dataset | Series)
- abstractmethod make_event_tables(fcst, obs, *, event_threshold=None, op_fn=None)#
This method should be over-ridden to return forecast and observed event tables
- Parameters:
fcst (DataArray | Dataset | Series)
obs (DataArray | Dataset | Series)
- scores.categorical.risk_matrix_score(fcst, obs, decision_weights, severity_dim, prob_threshold_dim, *, threshold_assignment='lower', reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the risk matrix score of Taggart & Wilke (2025).
Let \((S_1, \ldots,S_m)\) denote the tuple of nested severity categories, let \((p_1, \ldots,p_n)\) denote the probability thresholds that delineate the certainty categories, and let \(w_{i,j}\) denote the weight applied to the decision point associated with severity category \(S_i\) and probability threshold \(p_j\).
In this implementation of the risk matrix score, a forecast \(F\) is of the form \(F=(f_1,\ldots,f_m)\) where \(f_i\) denotes the forecast probability that the observation lies in severity category \(S_i\). A corresponding observation \(y\) is given by \(y=(y_1,\ldots,y_m)\) where \(y_i\) is 1 if the observation lies in severity category \(S_i\) and 0 otherwise. Then the risk matrix score \(\text{RMaS}\) is given by the formula
\[\text{RMaS}(F,y) = \sum_{i=1}^m\sum_{j=1}^n w_{i,j} \, s_j(f_i, y_i),\]where
\[\begin{split}s_j(f_i,y_i) = \begin{cases} p_j & \text{if } y_i=0\text{ and } f_i \geq p_j \\ 1 - p_j & \text{if } y_i=1\text{ and } f_i < p_j \\ 0 & \text{otherwise.} \end{cases}\end{split}\]The formula above is for the case where the
threshold_assignmentis “lower”, with the adjustments (\(f_i > p_j\) and \(f_i \leq p_j\)) applied when thethreshold_assignmentis “upper”.- Parameters:
fcst (DataArray | Dataset) – an array of forecast probabilities for the observation lying in each severity category. Must have a dimension
severity_dim.obs (DataArray | Dataset) – an array of binary observations with a value of 1 if the observation was in the severity category and 0 otherwise. Must have a dimension
severity_dim.decision_weights (DataArray) – an array of non-negative weights to apply to each risk matrix decision point, indexed by coordinates in
severity_dimandprob_threshold_dim.severity_dim (str) – the dimension specifying severity categories.
prob_threshold_dim (str) – the dimension in
decision_weightsspecifying probability thresholds.threshold_assignment (str | None) – Either “upper” or “lower”. Specifies whether the probability intervals defining the certainty categories, with endpoints given by the decision thresholds in
decision_weights, are left or right closed. That is, whether the probability decision threshold is included in the upper (left closed) or lower (right closed) certainty category. Defaults to “lower”.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the mean risk matrix score, where the mean is taken over all forecast cases. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the mean risk matrix score, where the mean is taken over all forecast cases. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved, apart from
severity_dimandprob_threshold_dim. Only one ofreduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
An xarray object of risk matrix scores, averaged according to specified weights across appropriate dimensions.
- Raises:
ValueError – if
severity_dimis not a dimension offcst,obsordecision_weights.ValueError – if
severity_dimis a dimension ofweights.ValueError – if
prob_threshold_dimis not a dimension ofdecision_weights.ValueError – if
prob_threshold_dimis a dimension ofweights.ValueError – if
decision_weightsdoes not have exactly 2 dimensions.ValueError – if
fcstvalues are negative or greater than 1.ValueError – if
obsvalues are other than 0, 1 or nan.ValueError – if
prob_threshold_dimcoordinates are not strictly between 0 and 1.ValueError – if
severity_dimcoordinates differ for any offcst,obsordecision_weights.ValueError – if
threshold_assignmentis not “upper” or lower”.
- Return type:
DataArray | Dataset
References
Taggart, R. J., & Wilke, D. J. (2025). Warnings based on risk matrices: A coherent framework with consistent evaluation. Natural Hazards and Earth System Sciences, 25(8), 2657–2677. https://doi.org/10.5194/nhess-25-2657-2025
See also
scores.categorical.matrix_weights_to_array()scores.categorical.weights_from_warning_scaling()Examples
Calculate the risk matrix score where the risk matrix has three nested severity categories (“MOD+”, “SEV+” and “EXT”) and three probability thresholds (0.1, 0.3 and 0.5). The decision weights place greater emphasis on higher end severity.
>>> import xarray as xr >>> from scores.categorical import risk_matrix_score >>> decision_weights = xr.DataArray( >>> data=[[1, 2, 3], [1, 2, 3], [1, 2, 3]], >>> dims=["probability_threshold", "severity"], >>> coords={'probability_threshold': [0.1, 0.3, 0.5], 'severity': ["MOD+", "SEV+", "EXT"]} >>> ) >>> fcst = xr.DataArray( >>> data=[[0.45, 0.22, 0.05], [0.65, 0.32, 0.09]], >>> dims=["time", "severity"], >>> coords={'time': [0, 1], 'severity': ["MOD+", "SEV+", "EXT"]} >>> ) >>> obs = xr.DataArray( >>> data=[[1, 1, 0], [1, 0, 0]], >>> dims=["time", "severity"], >>> coords={'time': [0, 1], 'severity': ["MOD+", "SEV+", "EXT"]} >>> ) >>> risk_matrix_score(fcst, obs, decision_weights, "severity", "probability_threshold")
- scores.categorical.matrix_weights_to_array(matrix_weights, severity_dim, severity_coords, prob_threshold_dim, prob_threshold_coords)#
Generates a 2-dimensional xr.DataArray of weights for each decision point, which is used for the
scores.categorical.risk_matrix_score()calculation. Assumes that values toward the left inmatrix_weightscorrespond to less severe categories, while values towards the top inmatrix_weightscorrespond to higher probability thresholds.- Parameters:
matrix_weights (ndarray) – array of weights to place on each risk matrix decision point, with rows (ascending) corresponding to (increasing) probability thresholds, and columns (left to right) corresponding to (increasing) severity categories.
severity_dim (str) – name of the severity category dimension.
severity_coords (Iterable) – labels for each of the severity categories, in order of increasing severity. Does NOT include the lowest severity category for which no warning would be issued.
prob_threshold_dim (str) – name of the probability threshold dimension.
prob_threshold_coords (Iterable) – list of the probability decision thresholds in the risk matrix, strictly between 0 and 1.
- Returns:
xarray data array of weights for each risk matrix decision point, indexed by prob_threshold_dim and severity_dim.
- Raises:
ValueError – if
matrix_weightsisn’t two dimensional.ValueError – if number of rows of
matrix_weightsdoesn’t equal length ofprob_threshold_coords.ValueError – if number of columns of
matrix_weightsdoesn’t equal length ofseverity_coords.ValueError – if
prob_threshold_coordsaren’t strictly between 0 and 1.
- Return type:
DataArray
References
Taggart, R. J., & Wilke, D. J. (2025). Warnings based on risk matrices: A coherent framework with consistent evaluation. Natural Hazards and Earth System Sciences, 25(8), 2657–2677. https://doi.org/10.5194/nhess-25-2657-2025
Examples
Returns weights for each risk matrix decision point, where weights increase with increasing severity category and decrease with increasing probability threshold.
>>> import numpy as np >>> from scores.categorical import matrix_weights_to_array >>> matrix_weights = np.array([ >>> [1, 2, 3], >>> [2, 4, 6], >>> [3, 6, 9], >>> ]) >>> matrix_weights_to_array( >>> matrix_weights, "severity", ["MOD+", "SEV+", "EXT"], "prob_threshold", [0.1, 0.3, 0.5] >>> )
- scores.categorical.weights_from_warning_scaling(scaling_matrix, evaluation_weights, severity_dim, severity_coords, prob_threshold_dim, prob_threshold_coords)#
Given a warning scaling matrix, evaluation weights and other inputs, returns the weights for each risk matrix decision point as an xarray data array. The returned data array is designed to be used for the
scores.categorical.risk_matrix_score()calculation.Comprehensive checks are made on
scaling_matrixto ensure it satisfies the properties of warning scaling in Table 1 of Taggart & Wilke (2025).- Parameters:
scaling_matrix (ndarray) – a 2-dimensional matrix encoding the warning scaling. Warning levels are given integer values 0, 1, …, q. The top row corresponds to the highest certainty category while the right-most column corresponds to most severe category.
evaluation_weights (Iterable) – positive weights used for warning evaluation. The kth weight (corresponding to
evaluation_weights[k-1]) is proportional to the importance of accurately discriminating between warning states below level k and warning states at or above level k.len(evaluation_weights)must be at least q.severity_dim (str) – name of the severity category dimension.
severity_coords (Iterable) – labels for each of the severity categories, in order of increasing severity. Does NOT include the lowest severity category for which no warning would be issued.
prob_threshold_dim (str) – name of the probability threshold dimension.
prob_threshold_coords (Iterable) – list of the probability decision thresholds in the risk matrix, strictly between 0 and 1.
- Returns:
xarray data array of weights for each risk matrix decision point, indexed by prob_threshold_dim and severity_dim.
- Raises:
ValueError – if
matrix_weightsisn’t two dimensional.ValueError – if
scaling_matrixhas entries that are not non-negative integers.ValueError – if the first column or last row of
scaling_matrixhas nonzero entries.ValueError – if
scaling_matrixdecreases along any row (moving left to right).ValueError – if
scaling_matrixincreases along any column (moving top to bottom).ValueError – if number of rows of
matrix_weightsdoesn’t equal length ofprob_threshold_coords.ValueError – if number of columns of
matrix_weightsdoesn’t equal length ofseverity_coords.ValueError –
len(evaluation_weights)is less than the maximum value inscaling_matrix.ValueError – if
prob_threshold_coordsaren’t strictly between 0 and 1.ValueError – if
evaluation_weightsaren’t strictly positive.
- Return type:
DataArray
References
Taggart, R. J., & Wilke, D. J. (2025). Warnings based on risk matrices: A coherent framework with consistent evaluation. Natural Hazards and Earth System Sciences, 25(8), 2657–2677. https://doi.org/10.5194/nhess-25-2657-2025
Examples
Returns weights for each risk matrix decision point, for the SHORT-RANGE scaling matrix of Taggart & Wilke (2025), with ESCALATION evaluation weights.
>>> import numpy as np >>> from scores.categorical import weights_from_warning_scaling >>> scaling = np.array([ >>> [0, 2, 3, 3], >>> [0, 1, 2, 3], >>> [0, 1, 1, 2], >>> [0, 0, 0, 0], >>> ]) >>> weights_from_warning_scaling( >>> scaling, [1, 2, 3], "severity", ["MOD+", "SEV+", "EXT"], "prob_threshold", [0.1, 0.3, 0.5] >>> )
- scores.categorical.seeps(fcst, obs, prob_dry, light_heavy_threshold, *, dry_light_threshold=0.2, mask_clim_extremes=True, lower_masked_value=0.1, upper_masked_value=0.85, reduce_dims=None, preserve_dims=None, weights=None)#
Calculates the stable equitable error in probability space (SEEPS) score.
When used to evaluate precipitation forecasts, the SEEPS score calculates the performance of a forecast across three categories:
Dry weather (e.g., less than or equal to 0.2mm),
Light precipitation (the climatological lower two-thirds of rainfall conditioned on it raining),
Heavy precipitation (the climatological upper one-third of rainfall conditioned on it raining).
The SEEPS penalty matrix is defined as
\[\begin{split}s = \frac{1}{2} \left( \begin{matrix} 0 & \frac{1}{1-p_1} & \frac{1}{p_3}+\frac{1}{1-p_1} \\ \frac{1}{p_1} & 0 & \frac{1}{p_3} \\ \frac{1}{p_1}+\frac{1}{1-p_3} & \frac{1}{1-p_3} & 0 \end{matrix} \right)\end{split}\]- where
\(p_1\) is the climatological probability of the dry weather category,
\(p_3\) is the climatological probability of the heavy precipitation category,
The rows correspond to the forecast category (dry, light, heavy),
The columns correspond to the observation category (dry, light, heavy),
as defined in Eq 15 in Rodwell et al. (2010).
Let \(p_2\) denote the climatological probability of light precipitation occuring. Note that since \(p_2 = 2p_3\) and \(p_1 + p_2 + p_3 = 1\), then \(p_3 = (1 - p_1) / 3\) can be substituted into the penalty matrix. In this implementation, the user only provides \(p_1\) with the
prob_dryarg and the function calculates \(p_3\) internally. Additionally, this implementation of the score is negatively oriented, meaning that lower scores are better.Sometimes in the literature, a positively oriented version of SEEPS is used, which is defined as \(1 - \mathrm{SEEPS}\).
By default, the scores are only calculated for points where \(p_1 \in [0.1, 0.85]\) as per Rodwell et al. (2010). This can be changed by setting
mask_clim_extremestoFalseor by changing thelower_masked_valueandupper_masked_valueparameters.- Parameters:
fcst (DataArray) – An array of real-valued forecasts (e.g., precipitation forecasts in mm).
obs (DataArray) – An array of real-valued observations (e.g., precipitation forecasts in mm).
prob_dry (DataArray) – The climatological probability of the dry weather category. This is called \(p_1\) in the SEEPS penalty matrix. Must be in the range [0, 1].
light_heavy_threshold (DataArray) – An array of the rainfall thresholds (e.g., in mm) that separates light and heavy precipitation. The threshold itself is included in the light precipitation category.
dry_light_threshold (float | None) – The threshold (e.g., in mm) that separates dry weather from light precipitation. Defaults to 0.2. Dry weather is defined as less than or equal to this threshold.
mask_clim_extremes (bool | None) – If True, mask out the score at points where \(p_1\) is less than
lower_masked_valueor greater thanupper_masked_value. Instead a NaN is returned at these points. Defaults to True.lower_masked_value (float | None) – The SEEPS score is masked at points where
prob_dryis less than this value. Defaults to 0.1.upper_masked_value (float | None) – The SEEPS score is masked at points where
prob_dryis greater than this value. Defaults to 0.85.reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the SEEPS score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the SEEPS score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the SEEPS score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of reduce_dims and preserve_dims can be supplied. The default behaviour if neither are supplied is to reduce all dims.
weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.
- Returns:
An xarray DataArray containing the SEEPS score.
- Raises:
ValueError – if any values in prob_dry are outside the range [0, 1].
- Return type:
DataArray
Warning
This function raises a warning if any values in prob_dry are exactly equal to 0 or 1.
References
Rodwell, M. J., Richardson, D. S., Hewson, T. D., & Haiden, T. (2010). A new equitable score suitable for verifying precipitation in numerical weather prediction. Quarterly Journal of the Royal Meteorological Society, 136(650), 1344–1363. https://doi.org/10.1002/qj.656
Examples
>>> import numpy as np >>> import xarray as xr >>> from scores.categorical import seeps >>> fcst = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) >>> obs = xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) >>> prob_dry = xr.DataArray(np.random.rand(6, 8), dims=['lat', 'lon']) >>> light_heavy_threshold = 2 * xr.DataArray(np.random.rand(4, 6, 8), dims=['time', 'lat', 'lon']) >>> seeps(fcst, obs, prob_dry, light_heavy_threshold=light_heavy_threshold) <xarray.DataArray ()> Size: 8B array(0.84333334)
scores.spatial#
- class scores.fast.fss.typing.FssComputeMethod(value)#
Choice of compute backend for FSS.
- INVALID = -1#
invalid backend
- NUMPY = 1#
computations using numpy (default)
- NUMBA = 2#
computations using numba (currently unimplemented)
- scores.spatial.fss_2d(fcst, obs, *, event_threshold, window_size, spatial_dims, zero_padding=False, reduce_dims=None, preserve_dims=None, threshold_operator=None, compute_method=FssComputeMethod.NUMPY, dask='forbidden')#
Uses
fss_2d_single_field()to compute the Fractions Skill Score (FSS) for each 2D spatial field in the DataArray and then aggregates them over the output of gather dimensions.The aggregation method is the intended extension of the score defined by Robert and Leans (2008) for multiple forecasts [1,2]
Note
This method takes in a
threshold_operatorto compare the input fields against theevent_thresholdwhich defaults tonumpy.greater, and is compatible with lightweight numpy operators. If you would like to do more advanced binary discretization consider doing this separately and usingscores.spatial.fss_2d_binary()instead, which takes in a pre-discretized binary field.Optionally aggregates the output along other dimensions if reduce_dims or (mutually exclusive)
preserve_dimsare specified.For implementation for a single 2-D field, see:
fss_2d_single_field(). This offers the user the ability to use their own aggregation methods, rather thanxarray.ufuncWarning
daskoption is not fully tested- Parameters:
fcst (DataArray) – An array of forecasts
obs (DataArray) – An array of observations (same spatial shape as
fcst)event_threshold (float) – A scalar to compare
fcstandobsfields to generate a binary “event” field. (defaults tonp.greater).window_size (Tuple[int, int]) – A pair of positive integers
(height, width)of the sliding window_size; the window size must be greater than 0 and fit within the shape ofobsandfcst.spatial_dims (Tuple[str, str]) – A pair of dimension names
(x, y)wherexandyare the spatial dimensions to slide the window along to compute the FSS. e.g.("lat", "lon").zero_padding (bool) – If set to true, applies a 0-valued window border around the data field before computing the FSS. If set to false (default), it uses the edges (thickness = window size) of the input fields as the border. One can think of it as: - zero_padding = False => inner border - zero_padding = True => outer border with 0 values
reduce_dims (Iterable[Hashable] | None) – Optional set of additional dimensions to aggregate over (mutually exclusive to
preserve_dims).preserve_dims (Iterable[Hashable] | None) – Optional set of dimensions to keep (all other dimensions are reduced); By default all dimensions except
spatial_dimsare preserved (mutually exclusive toreduce_dims).threshold_operator (Callable | None) – The threshold operator used to generate the binary event field. E.g.
np.greater. Note: this may depend on the backendcompute methodand not all operators may be supported. Generally,np.greater,np.lessand their counterparts. Defaults to “np.greater”.compute_method (FssComputeMethod) – currently only supports
FssComputeMethod.NUMPYsee:scores.fast.fss.typing.FssComputeMethoddask (str) – See
xarray.apply_ufuncfor options
- Returns:
An
xarray.DataArraycontaining the FSS computed over thespatial_dimsThe resultant array will have the score grouped against the remaining dimensions, unless
reduce_dims/preserve_dimsare specified; in which case, they will be aggregated over the specified dimensions accordingly.For an exact usage please refer to
FSS.ipynbin the tutorials.- Raises:
DimensionError – Various errors are thrown if the input dimensions do not conform. e.g. if the window size is larger than the input arrays, or if the the spatial dimensions in the args are missing in the input arrays.
DimensionWarning – If
spatial_dimsare attempting to be preserved e.g. inpreserve_dims
- Return type:
DataArray
References
Roberts, N. M., and H. W. Lean, 2008: Scale-Selective Verification of Rainfall Accumulations from High-Resolution Forecasts of Convective Events. Monthly Weather Review, 136, 78–97, https://doi.org/10.1175/2007mwr2123.1.
Mittermaier, M. P., 2021: A “Meta” Analysis of the Fractions Skill Score: The Limiting Case and Implications for Aggregation. Monthly Weather Review, 149, 3491–3504, https://doi.org/10.1175/mwr-d-18-0106.1.
- scores.spatial.fss_2d_binary(fcst, obs, *, window_size, spatial_dims, zero_padding=False, reduce_dims=None, preserve_dims=None, compute_method=FssComputeMethod.NUMPY, check_boolean=True, dask='forbidden')#
Computes the Fractions Skill Score (FSS) using a binary field.
Takes in a binary (True or False) field of fcst and obs events. For example the output of a binary threshold applied to a continuous field or an event operator.
Optionally the user can set
check_booleantoTrueto check that the field is boolean before any computations. Note: this asserts that the underlying fields are of typenp.bool_but does not coerce them. If the user is confident that the input field is binary (but not necessarily boolean), they may set this flag toFalse, to allow for more flexible binary configurations such as 0 and 1; this should give the same results.Uses
fss_2dfrom thescores.spatialmodule to perform fss computation, but without the threshold operation.See also
scores.spatial.fss_2d()for more details and the continuous version. As well as detailed argument definitions.- Parameters:
fcst (DataArray)
obs (DataArray)
window_size (Tuple[int, int])
spatial_dims (Tuple[str, str])
zero_padding (bool)
reduce_dims (Iterable[Hashable] | None)
preserve_dims (Iterable[Hashable] | None)
compute_method (FssComputeMethod)
check_boolean (bool)
dask (str)
- Return type:
DataArray
- scores.spatial.fss_2d_single_field(fcst, obs, *, event_threshold, window_size, zero_padding=False, threshold_operator=None, compute_method=FssComputeMethod.NUMPY)#
Calculates the Fractions Skill Score (FSS) [1] for a given forecast and observed 2-D field.
The FSS is computed by counting the squared sum of forecast and observation events in a given window size. This is repeated over all possible window positions that can fit in the input arrays. A common method to do this is via a sliding window.
While the various compute backends may have their own implementations; most of them use some form of prefix sum algorithm to compute this quickly.
For 2-D fields this data structure is known as the “Summed Area Table” [2].
Once the squared sums are computed, the final FSS value can be derived by accumulating the squared sums [3].
The caller is responsible for making sure the input fields are in the 2-D spatial domain. (Although it should work for any
np.arrayas long as it’s 2D, andwindow_sizeis appropriately sized.)- Parameters:
fcst (ndarray[tuple[Any, ...], dtype[float]]) – An array of forecasts
obs (ndarray[tuple[Any, ...], dtype[float]]) – An array of observations (same spatial shape as
fcst)event_threshold (float) – A scalar to compare
fcstandobsfields to generate a binary “event” field.window_size (Tuple[int, int]) – A pair of positive integers
height, width)of the sliding window; the window dimensions must be greater than 0 and fit within the shape ofobsandfcst.threshold_operator (Callable | None) – The threshold operator used to generate the binary event field. E.g.
np.greater. Note: this may depend on the backendcompute methodand not all operators may be supported. Generally,np.greater,np.lessand their counterparts. Defaults to “np.greater”.compute_method (FssComputeMethod) – currently only supports
FssComputeMethod.NUMPYsee: :py:class:FssComputeMethodzero_padding (bool)
- Returns:
A float representing the accumulated FSS.
- Raises:
DimensionError – Various errors are thrown if the input dimensions do not conform. e.g. if the window size is larger than the input arrays, or if the the spatial dimensions of the input arrays do not match.
- Return type:
float64
References
Roberts, N. M., and H. W. Lean, 2008: Scale-Selective Verification of Rainfall Accumulations from High-Resolution Forecasts of Convective Events. Monthly Weather Review 136, 78–97, https://doi.org/10.1175/2007mwr2123.1.
FAGGIAN, N., B. ROUX, P. STEINLE, and B. EBERT, 2015: Fast calculation of the fractions skill score. MAUSAM, 66, 457–466, https://doi.org/10.54302/mausam.v66i3.555.
scores.stats#
- scores.stats.statistical_tests.diebold_mariano(da_timeseries, ts_dim, h_coord, *, method='HG', confidence_level=0.95, statistic_distribution='normal')#
Given an array of (multiple) timeseries, with each timeseries consisting of score differences for h-step ahead forecasts, calculates a modified Diebold-Mariano test statistic for each timeseries. Several other statistics are also returned such as the confidence that the population mean of score differences is greater than zero and confidence intervals for that mean.
Two methods for calculating the test statistic have been implemented: the “HG” method Hering and Genton (2011) and the “HLN” method of Harvey, Leybourne and Newbold (1997). The default “HG” method has an advantage of only generating positive estimates for the spectral density contribution to the test statistic. For further details see scores.stats.confidence_intervals.impl._dm_test_statistic.
Prior to any calculations, NaNs are removed from each timeseries. If there are NaNs in da_timeseries then a warning will occur. This is because NaNs may impact the autocovariance calculation.
To determine the value of h for each timeseries of score differences of h-step ahead forecasts, one may ask ‘How many observations of the phenomenon will be made between making the forecast and having the observation that will validate the forecast?’ For example, suppose that the phenomenon is afternoon precipitation accumulation in New Zealand (00z to 06z each day). Then a Day+1 forecast issued at 03z on Day+0 will be a 2-ahead forecast, since Day+0 and Day+1 accumulations will be observed before the forecast can be validated. On the other hand, a Day+1 forecast issued at 09z on Day+0 will be a 1-step ahead forecast. The value of h for each timeseries in the array needs to be specified in one of the sets of coordinates. See the example below.
Confidence intervals and “confidence_gt_0” statistics are calculated using the test statistic, which is assumed to have either the standard normal distribution or Student’s t distribution with n - 1 degrees of freedom (where n is the length of the timeseries). The distribution used is specified by statistic_distribution. See Harvey, Leybourne and Newbold (1997) for why the t distribution may be preferred, especially for shorter timeseries.
If da_timeseries is a chunked array, data will be brought into memory during this calculation due to the autocovariance implementation.
- Parameters:
da_timeseries (DataArray) – a 2 dimensional array containing the timeseries.
ts_dim (str) – name of the dimension which identifies each timeseries in the array.
h_coord (str) – name of the coordinate specifying, for each timeseries, that the timeseries is an h-step ahead forecast. h_coord coordinates must be indexed by the dimension ts_dim.
method (Literal['HG', 'HLN']) – method for calculating the test statistic, one of “HG” or “HLN”.
confidence_level (float) – the confidence level, between 0 and 1 exclusive, at which to calculate confidence intervals.
statistic_distribution (Literal['normal', 't']) – the distribution of the test-statistic under the null hypothesis of equipredictive skill. Used to calculate the “confidence_gt_0” statistic and confidence intervals. One of “normal” or “t” (for Student’s t distribution).
- Returns:
“mean”: the mean value for each timeseries, ignoring NaNs
”dm_test_stat”: the modified Diebold-Mariano test statistic for each timeseries
”timeseries_len”: the length of each timeseries, with NaNs removed.
”confidence_gt_0”: the confidence that the mean value of the population is greater than zero, based on the specified statistic_distribution. Precisely, it is the value of the cumululative distribution function evaluated at dm_test_stat.
”ci_upper”: the upper end point of a confidence interval about the mean at specified confidence_level.
”ci_lower”: the lower end point of a confidence interval about the mean at specified confidence_level.
- Return type:
Dataset, indexed by ts_dim, with six variables
- Raises:
ValueError – if method is not one of “HG” or “HLN”.
ValueError – if statistic_distribution is not one of “normal” or “t”.
ValueError – if 0 < confidence_level < 1 fails.
ValueError – if len(da_timeseries.dims) != 2.
ValueError – if ts_dim is not a dimension of da_timeseries.
ValueError – if h_coord is not a coordinate of da_timeseries.
ValueError – if ts_dim is not the only dimension of da_timeseries[h_coord].
ValueError – if h_coord values aren’t positive integers.
ValueError – if h_coord values aren’t less than the lengths of the timeseries after NaNs are removed.
RuntimeWarnning – if there is a NaN in diffs.
References
Diebold and Mariano, ‘Comparing predictive accuracy’, Journal of Business and Economic Statistics 13 (1995), 253-265.
Hering and Genton, ‘Comparing spatial predictions’, Technometrics 53 no. 4 (2011), 414-425.
Harvey, Leybourne and Newbold, ‘Testing the equality of prediction mean squared errors’, International Journal of Forecasting 13 (1997), 281-291.
Example
This array gives three timeseries of score differences. Coordinates in the “lead_day” dimension uniquely identify each timeseries. Here ts_dim=”lead_day”. Coordinates in the “valid_date” dimension give the forecast validity timestamp of each item in the timeseries. The “h” coordinates specify that the timeseries are for 2, 3 and 4-step ahead forecasts respectively. Here h_coord=”h”.
>>> da_timeseries = xr.DataArray( ... data=[[1, 2, 3.0, 4, np.nan], [2.0, 1, -3, -1, 0], [1.0, 1, 1, 1, 1]], ... dims=["lead_day", "valid_date"], ... coords={ ... "lead_day": [1, 2, 3], ... "valid_date": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"], ... "h": ("lead_day", [2, 3, 4]), ... }, ... )
>>> dm_test_stats(da_timeseries, "lead_day", "h")
scores.processing#
- scores.processing.aggregate(values, *, reduce_dims, weights=None, method='mean')#
Computes a weighted or unweighted aggregation of the input data across specified dimensions. The input data is typically the “score” at each point.
This function applies a mean reduction or a sum over the dimensions given by
reduce_dimson the inputvalues, optionally using weights to compute a weighted mean or sum. Themethodarg specifies if you want to produce a weighted mean or weighted sum.If reduce_dims is None, no aggregation is performed and the original
valuesare returned unchanged.If
weightsis None, an unweighted mean or sum is computed. If weights are provided, negative weights are not allowed and will raise aValueError.If weights are provided but
reduce_dimsis None (i.e., no reduction), aUserWarningis emitted since the weights will be ignored.Weights must not contain NaN values. Missing values can be filled by
weights.fillna(0)if you would like to assign a weight of zero to those points (e.g., masking).- Parameters:
values (DataArray | Dataset) – Input data to be aggregated.
reduce_dims (Iterable[Hashable] | None) – Dimensions over which to apply the mean. Can be a string, list of strings, or None. If None, no reduction is performed.
weights (DataArray | Dataset | None) – Weights to apply for weighted averaging. Must be broadcastable to
valuesand contain no negative values. If None, an unweighted mean is calculated. Defaults to None.method (str) – Aggregation method to use. Either “mean” or “sum”. Defaults to “mean”.
- Returns:
An xarray object (same type as the input) with (un)weighted mean or sum of
values- Raises:
ValueError – If
weightscontains any negative values.ValueError – if
weightscontains any NaN valuesValueError – if
methodis not ‘mean’ or ‘sum’ValueError – if
weightsis an xr.Dataset whenvaluesis an xr.DataArray
- Return type:
DataArray | Dataset
Warning
UserWarning: If weights are provided but no reduction is performed (
reduce_dimsis None), a warning is issued since weights are ignored.Examples
>>> import xarray as xr >>> import numpy as np >>> da = xr.DataArray(np.arange(6).reshape(2, 3), dims=['x', 'y']) >>> weights = xr.DataArray([1, 2], dims=['x']) >>> apply_weighted_mean(da, reduce_dims=['x'], weights=weights) <xarray.DataArray (y: 3)> array([2., 3., 4.]) Dimensions without coordinates: y
- scores.processing.block_bootstrap(array_list, *, blocks, n_iteration, exclude_dims=None, circular=True)#
Perform block bootstrapping on provided arrays. The function creates new arrays by repeatedly bootstrapping along specified dimensions and stacking the new arrays along a new “iteration” dimension. Additionally, it includes internal functions for chunk size calculation and handling Dask arrays for chunk size limitation.
- Parameters:
array_list (List[DataArray | Dataset] | DataArray | Dataset) – The data to bootstrap, which can be a single xarray object or a list of multiple xarray objects. In the case where multiple datasets are passed, each dataset can have its own set of dimension. However, for successful bootstrapping, dimensions across all input arrays must be nested. For instance, for
block.keys=['d1', 'd2', 'd3'], an array with dimension ‘d1’ and ‘d2’ is valid, but an array with only dimension ‘d2’ is not valid. All datasets are bootstrapped according to the same random samples along available dimensions.blocks (Dict[str, int]) – A dictionary specifying the dimension(s) to bootstrap and the block sizes to use along each dimension:
{dimension: block_size}. The keys represent the dimensions to be bootstrapped, and the values indicate the block sizes along each dimension. The dimension provided here should exist in the data provided inarray_list.n_iteration (int) – The number of iterations to repeat the bootstrapping process. Determines how many bootstrapped arrays will be generated and stacked along the iteration dimension.
exclude_dims (List[List[str]] | None) – An optional parameter indicating the dimensions to be excluded during bootstrapping for each array provided in
array_list. This parameter expects a list of lists, where each inner list corresponds to the dimensions to be excluded for the respective array. By default, the assumption is that no dimensions are excluded, and all arrays are bootstrapped across all specified dimensions inblocks.circular (bool) – A boolean flag indicating whether circular block bootstrapping should be performed. Circular bootstrapping means that bootstrapping continues from the beginning when the end of the data is reached. By default, this parameter is set to True.
- Returns:
If a single Dataset/DataArray (XarrayLike) is provided, the functions returns a bootstrapped XarrayLike object along the “iteration” dimension. If multiple XarrayLike objects are provided, it returns a tuple of bootstrapped XarrayLike objects, each stacked along the “iteration” dimension.
- Raises:
ValueError – If bootstrapped dimensions don’t consistent sizes across
arrays_list.ValueError – If there is not at least one input array that contains all dimensions in blocks.keys().
ValueError – If
exclude_dimsis not a list of lists.ValueError – If the list
exclude_dimsis not the same length as the number of asarray_list.
- Return type:
DataArray | Dataset | Tuple[DataArray | Dataset, …]
References
Gilleland, E. (2020). Bootstrap Methods for Statistical Inference. Part I: Comparative Forecast Verification for Continuous Variables. Journal of Atmospheric and Oceanic Technology, 37(11), 2117–2134. https://doi.org/10.1175/jtech-d-20-0069.1
Wilks, D. S. (2011). Statistical methods in the atmospheric sciences. Academic press. https://doi.org/10.1016/C2017-0-03921-6
Examples
Bootstrap a fcst and obs dataset along the time and space dimensions with block sizes of 10 for each dimension. The bootstrapping is repeated 1000 times.
>>> import numpy as np >>> import xarray as xr >>> from scores.processing import block_bootstrap >>> obs = xr.DataArray(np.random.rand(100, 100), dims=["time", "space"]) >>> fcst = xr.DataArray(np.random.rand(100, 100), dims=["time", "space"]) >>> bootstrapped_obs, bootstrapped_fcst = block_bootstrap( ... [obs, fcst], ... blocks={"time": 10, "space": 10}, ... n_iteration=1000, ... )
- scores.processing.isotonic_fit(fcst, obs, *, weight=None, functional='mean', bootstraps=None, quantile_level=None, solver=None, confidence_level=0.9, min_non_nan=1, report_bootstrap_results=False)#
Calculates an isotonic regression fit to observations given forecasts as the explanatory values. Optionally returns confidence bands on the fit via bootstrapping. Forecasts and observations are scalar-valued (i.e, integers, floats or NaN).
If forecasts target the mean or a quantile functional, then the user can select that functional. Otherwise the user needs to supply a solver function
of one variable (if no weights are supplied), or
of two variables (if weights are supplied).
The solver function is used to find an optimal regression fit in each block, subject to a non-decreasing monotonicity constraint, using the pool adjacent violators (PAV) algorithm.
An example of a solver function of one variable is obs -> np.mean(obs) An example of a solver function of two variables is (obs, weight) -> np.average(obs, weights=weight).
Ties (where one forecast value has more than one observation value) are handled as follows. Forecasts are sorted in ascending order, and observations sorted to match. Where ties occur, observations are sorted in descending order within each subset of ties. Isotonic regression is then performed on this sorted observation sequence.
This implementation uses sklearn.isotonic.IsotonicRegression when functional=”mean”.
- Parameters:
fcst (ndarray | DataArray) – np.array or xr.DataArray of forecast (or explanatory) values. Values must be float, integer or NaN.
obs (ndarray | DataArray) – np.array or xr.DataArray of corresponding observation (or response) values. Must be the same shape as fcst. Values must be float, integer or NaN.
weight (ndarray | DataArray | None) – positive weights to apply to each forecast-observation pair. Must be the same shape as fcst, or None (which is equivalent to applying equal weights).
functional (Literal['mean', 'quantile'] | None) – Functional that the forecasts are targeting. Either “mean” or “quantile” or None. If None then solver must be supplied. The current implementation for “quantile” does not accept weights. If weighted quantile regression is desired then the user should should supply an appropriate solver.
bootstraps (int | None) – the number of bootstrap samples to perform for calculating the regression confidence band. Set to None if a confidence band is not required.
quantile_level (float | None) – the level of the quantile functional if functional=’quantile’. Must be strictly between 0 and 1.
solver (Callable[[ndarray, ndarray], float] | Callable[[ndarray], float] | None) – function that accepts 1D numpy array of observations and returns a float. Function values give the regression fit for each block as determined by the PAV algorithm. If weights are supplied, the function must have a second argument that accepts 1D numpy array of weights of same length as the observations.
confidence_level (float | None) – Confidence level of the confidence band, strictly between 0 and 1. For example, a confidence level of 0.5 will calculate the confidence band by computing the 25th and 75th percentiles of the bootstrapped samples.
min_non_nan (int | None) – The minimum number of non-NaN bootstrapped values to calculate confidence bands at a particular forecast value.
report_bootstrap_results (bool | None) – This specifies to whether keep the bootstrapped regression values in return dictionary. Default is False to keep the output result small.
- Returns:
- “unique_fcst_sorted”: 1D numpy array of remaining forecast values sorted in
ascending order, after any NaNs from fcst, obs and weight are removed, and only unique values are kept to keep the output size reasonably small.
”fcst_counts”: 1D numpy array of forecast counts for unique values of forecast sorted
- ”regression_values”: 1D numpy array of regression values corresponding to
”unique_fcst_sorted” values.
- ”regression_func”: function that returns the regression fit based on linear
interpolation of (“fcst_sorted”, “regression_values”), for any supplied argument (1D numpy array) of potential forecast values.
- ”bootstrap_results”: in the case of report_bootstrap_results=True, 2D numpy
array of bootstrapped regression values is included in return dictionary. Each row gives the interpolated regression values from a particular bootstrapped sample, evaluated at “fcst_sorted” values. If m is the number of bootstrap samples and n = len(fcst_sorted) then it is has shape (m, n). We emphasise that this array corresponds to fcst_sorted not unique_fcst_sorted.
- ”confidence_band_lower_values”: values of lower confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_upper_values”: values of upper confidence band threshold, evaluated
at “unique_fcst_sorted” values.
- ”confidence_band_lower_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_lower_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_upper_func”: function that returns regression fit based on linear
interpolation of (“fcst_sorted”, “confidence_band_upper_values”), given any argument (1D numpy array) of potential forecast values.
- ”confidence_band_levels”: tuple giving the quantile levels used to calculate the
confidence band.
- Return type:
Dictionary with the following keys
- Raises:
ValueError – if fcst and obs are np.arrays and don’t have the same shape.
ValueError – if fcst and weight are np.arrays and don’t have the same shape.
ValueError – if fcst and obs are xarray.DataArrays and don’t have the same dimensions.
ValueError – if fcst and weight are xarray.DataArrays and don’t have the same dimensions.
ValueError – if any entries of fcst, obs or weight are not an integer, float or NaN.
ValueError – if there are no fcst and obs pairs after NaNs are removed.
ValueError – if functional is not one of “mean”, “quantile” or None.
ValueError – if quantile_level is not strictly between 0 and 1 when functional=”quantile”.
ValueError – if weight is not None when functional=”quantile”.
ValueError – if functional and solver are both None.
ValueError – if not exactly one of functional and solver is None.
ValueError – if bootstraps is not a positive integer.
ValueError – if confidence_level is not strictly between 0 and 1.
- Note: This function only keeps the unique values of fcst_sorted to keep the volume of
the return dictionary small. The forecast counts is also included, so users can it to create forecast histogram (usually displayed in the reliability diagrams).
- References
de Leeuw, Hornik and Mair. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods”, Journal of Statistical Software, 2009.
Dimitriadis, Gneiting and Jordan. “Stable reliability diagrams for probabilistic classifiers”, PNAS, Vol. 118 No. 8, 2020. Available at https://www.pnas.org/doi/10.1073/pnas.2016191118
Jordan, Mühlemann, and Ziegel. “Optimal solutions to the isotonic regression problem”, 2020 (version 2), available on arxiv at https://arxiv.org/abs/1904.04761
- scores.processing.broadcast_and_match_nan(*args: Dataset) tuple[Dataset, ...]#
- scores.processing.broadcast_and_match_nan(*args: DataArray) tuple[DataArray, ...]
Input xarray data objects are ‘matched’ - they are broadcast against each other (forced to have the same dimensions), and the position of nans are forced onto all DataArrays. This matching process is applied across all supplied DataArrays, as well as all DataArrays inside supplied Datasets.
- Parameters:
*args – any number of xarray data objects supplied as positional arguments. See examples below.
- Returns:
A tuple of data objects of the same length as the number of data objects supplied as input. Each returned object is the ‘matched’ version of the input.
- Raises:
ValueError – if any input args is not an xarray data object.
Examples
>>> # Matching xarray data objects >>> da1_matched, ds_matched, da2_matched = scores.processing.broadcast_and_match_nan(da1, ds, da2)
>>> # Matching a tuple of xarray data objects >>> input_tuple = (da1, ds, da2) >>> matched_tuple = broadcast_and_match_nan(*input_tuple) >>> da1_matched = matched_tuple[0] >>> ds_matched = matched_tuple[1] >>> da2_matched = matched_tuple[2]
- scores.processing.comparative_discretise(data, comparison, mode, *, abs_tolerance=None)#
Converts the values of data to 0 or 1 based on how they relate to the specified values in comparison via the mode operator.
- Parameters:
data (DataArray | Dataset) – The data to convert to discrete values.
comparison (DataArray | float | int) – The values to which to compare data.
mode (Callable | str) –
Specifies the required relation of data to thresholds for a value to fall in the ‘event’ category (i.e. assigned to 1). The mode can be a string or a Python operator function from the list below. Allowed modes are:
operator.ge: Where values in data greater than or equal to the corresponding threshold are assigned as 1.operator.gt: Where values in data greater than the corresponding threshold are assigned as 1.operator.le: Where values in data less than or equal to the corresponding threshold are assigned as 1.operator.lt: Where values in data less than the corresponding threshold are assigned as 1.operator.eq: values in data equal to the corresponding threshold are assigned as 1.operator.ne: values in data not equal to the corresponding threshold are assigned as 1.’>=’ values in data greater than or equal to the corresponding threshold are assigned as 1.
’>’ values in data greater than the corresponding threshold are assigned as 1.
’<=’ values in data less than or equal to the corresponding threshold are assigned as 1.
’<’ values in data less than the corresponding threshold are assigned as 1.
’==’ values in data equal to the corresponding threshold are assigned as 1.
’!=’ values in data not equal to the corresponding threshold are assigned as 1.
abs_tolerance (float | None) – If supplied, values in data that are within abs_tolerance of a threshold are considered to be equal to that threshold. This is generally used to correct for floating point rounding, e.g. we may want to consider 1.0000000000000002 as equal to 1.
- Returns:
An xarray data object of the same type as data. The dimensions of the output are the union of all dimensions in data and comparison. The values of the output are either 0 or 1 or NaN, depending on the truth of the operation data <mode> comparison.
- Raises:
ValueError – if abs_tolerance is not a non-negative float.
ValueError – if mode is not valid.
TypeError – if comparison is not a float, int or xarray.DataArray.
- Return type:
DataArray | Dataset
- scores.processing.binary_discretise(data, thresholds, mode, *, abs_tolerance=None, autosqueeze=False)#
Converts the values of data to 0 or 1 for each threshold in thresholds according to the operation defined by mode.
- Parameters:
data (DataArray | Dataset) – The data to convert to discrete values.
thresholds (Real | Sequence[Real]) – Threshold(s) at which to convert the values of data to 0 or 1.
mode (Callable | str) –
Specifies the required relation of data to thresholds for a value to fall in the ‘event’ category (i.e. assigned to 1). The mode can be a string or a Python operator function from the list below. Allowed modes are:
operator.ge: Where values in data greater than or equal to the corresponding threshold are assigned as 1.operator.gt: Where values in data greater than the corresponding threshold are assigned as 1.operator.le: Where values in data less than or equal to the corresponding threshold are assigned as 1.operator.lt: Where values in data less than the corresponding threshold are assigned as 1.operator.eq: values in data equal to the corresponding threshold are assigned as 1.operator.ne: values in data not equal to the corresponding threshold are assigned as 1.’>=’ values in data greater than or equal to the corresponding threshold are assigned as 1.
’>’ values in data greater than the corresponding threshold are assigned as 1.
’<=’ values in data less than or equal to the corresponding threshold are assigned as 1.
’<’ values in data less than the corresponding threshold are assigned as 1.
’==’ values in data equal to the corresponding threshold are assigned as 1.
’!=’ values in data not equal to the corresponding threshold are assigned as 1.
abs_tolerance (float | None) – If supplied, values in data that are within abs_tolerance of a threshold are considered to be equal to that threshold. This is generally used to correct for floating point rounding, e.g. we may want to consider 1.0000000000000002 as equal to 1
autosqueeze (bool | None) – If True and only one threshold is supplied, then the dimension ‘threshold’ is squeezed out of the output. If thresholds is float-like, then this is forced to True, otherwise defaults to False.
- Returns:
An xarray data object with the type and dimensions of data, plus an extra dimension ‘threshold’ if autosqueeze is False. The values of the output are either 0 or 1, depending on whether data <mode> threshold is True or not (although NaNs are preserved).
- Raises:
ValueError – if ‘threshold’ is a dimension in data.
ValueError – if “Values in thresholds are not monotonic increasing”
- Return type:
DataArray | Dataset
- scores.processing.binary_discretise_proportion(data, thresholds, mode, *, reduce_dims=None, preserve_dims=None, abs_tolerance=None, autosqueeze=False)#
Returns the proportion of data in each category. The categories are defined by the relationship of data to threshold as specified by the operation mode.
- Parameters:
data (DataArray | Dataset) – The data to convert into 0 and 1 according the thresholds before calculating the proportion.
thresholds (Iterable) – The proportion of values equal to or exceeding these thresholds will be calculated.
mode (Callable | str) –
Specifies the required relation of data to thresholds for a value to fall in the ‘event’ category (i.e. assigned to 1). The mode can be a string or a Python operator function from the list below. Allowed modes are:
operator.ge: Where values in data greater than or equal to the corresponding threshold are assigned as 1.operator.gt: Where values in data greater than the corresponding threshold are assigned as 1.operator.le: Where values in data less than or equal to the corresponding threshold are assigned as 1.operator.lt: Where values in data less than the corresponding threshold are assigned as 1.operator.eq: values in data equal to the corresponding threshold are assigned as 1.operator.ne: values in data not equal to the corresponding threshold are assigned as 1.’>=’ values in data greater than or equal to the corresponding threshold are assigned as 1.
’>’ values in data greater than the corresponding threshold are assigned as 1.
’<=’ values in data less than or equal to the corresponding threshold are assigned as 1.
’<’ values in data less than the corresponding threshold are assigned as 1.
’==’ values in data equal to the corresponding threshold are assigned as 1.
’!=’ values in data not equal to the corresponding threshold are assigned as 1.
reduce_dims (Iterable[Hashable] | None) – Dimensions to reduce.
preserve_dims (Iterable[Hashable] | None) – Dimensions to preserve.
abs_tolerance (bool | None) – If supplied, values in data that are within abs_tolerance of a threshold are considered to be equal to that threshold. This is generally used to correct for floating point rounding, e.g. we may want to consider 1.0000000000000002 as equal to 1.
autosqueeze (bool) – If True and only one threshold is supplied, then the dimension ‘threshold’ is squeezed out of the output. If thresholds is float-like, then this is forced to True, otherwise defaults to False.
- Returns:
An xarray data object with the type of data, dimension dims + ‘threshold’. The values of the output are the proportion of data that satisfy the relationship to thresholds as specified by mode.
Examples
>>> import operator >>> import xarray as xr >>> from scores.processing import binary_discretise_proportion >>> data = xr.DataArray([0, 0.5, 0.5, 1]) >>> binary_discretise_proportion(data, [0, 0.5, 1], operator.eq) <xarray.DataArray (threshold: 3)> array([ 0.25, 0.5 , 0.25]) Coordinates: * threshold (threshold) float64 0.0 0.5 1.0 Attributes: discretisation_tolerance: 0 discretisation_mode: ==
>>> binary_discretise_proportion(data, [0, 0.5, 1], operator.ge) <xarray.DataArray (threshold: 3)> array([ 1. , 0.75, 0.25]) Coordinates: * threshold (threshold) float64 0.0 0.5 1.0 Attributes: discretisation_tolerance: 0 discretisation_mode: >=
- scores.processing.proportion_exceeding(data, thresholds, *, reduce_dims=None, preserve_dims=None)#
Calculates the proportion of data equal to or exceeding thresholds.
- Parameters:
data (xarray.Dataset or xarray.DataArray) – The data from which to calculate the proportion exceeding thresholds
thresholds (iterable) – The proportion of values equal to or exceeding these thresholds will be calculated.
reduce_dims (Iterable[Hashable] | None) – Dimensions to reduce.
preserve_dims (Iterable[Hashable] | None) – Dimensions to preserve.
- Returns:
An xarray data object with the type of data and dimensions dims + ‘threshold’. The values are the proportion of data that are greater than or equal to the corresponding threshold.
- scores.processing.cdf.round_values(array, rounding_precision, *, final_round_decpl=7)#
Round data array to specified precision.
Rounding is done differently to xarray.DataArray.round or numpy.round where the number of decimal places is specified in those cases. Instead, here the rounding precision is specified as a float. The value is rounded to the nearest value that is divisible by rounding_precision.
For example, 3.73 rounded to precision 0.2 is 3.8, and 37.3 rounded to precision 20 is 40.
Assumes that rounding_precision >=0, with 0 indicating no rounding to be performed. If rounding_precision > 0, a final round to final_round_decpl decimal places is performed to remove artefacts of python rounding process.
- Parameters:
array (xr.DataArray) – array of data to be rounded
rounding_precision (float) – rounding precision
final_round_decpl (int) – final round to specified number of decimal places when rounding_precision > 0.
- Returns:
DataArray with rounded values.
- Return type:
xr.DataArray
- Raises:
ValueError – If rounding_precision < 0.
- scores.processing.cdf.propagate_nan(cdf, threshold_dim)#
Propagates the NaN values from a “cdf” variable along the threshold_dim.
- Parameters:
cdf (xr.DataArray) – CDF values, so that P(X <= threshold) = cdf_value for each threshold in the threshold_dim dimension.
threshold_dim (str) – name of the threshold dimension in cdf.
- Returns:
cdf variable with NaNs propagated.
- Return type:
xr.DataArray
- Raises:
ValueError – If threshold_dim is not a dimension of cdf.
- scores.processing.cdf.observed_cdf(obs, threshold_dim, *, threshold_values=None, include_obs_in_thresholds=True, precision=0)#
Returns a data array of observations converted into CDF format.
- Such that:
returned_value = 0 if threshold < observation returned_value = 1 if threshold >= observation
- Parameters:
obs (xr.DataArray) – observations
threshold_dim (str) – name of dimension in returned array that contains the threshold values.
threshold_values (Optional[Iterable[float]]) – values to include among thresholds.
include_obs_in_thresholds (bool) – if True, include (rounded) observed values among thresholds.
precision (float) – precision applied to observed values prior to constructing the CDF and thresholds. Select 0 for highest precision (i.e. no rounding).
- Returns:
Observed CDFs and thresholds in the threshold_dim dimension.
- Return type:
xr.DataArray
- Raises:
ValueError – if precision < 0.
ValueError – if all observations are NaN and no non-NaN threshold_values are not supplied.
- scores.processing.cdf.integrate_square_piecewise_linear(function_values, threshold_dim)#
Calculates integral values and collapses threshold_dim.
- Calculates \(\int{(F(t)^2)}\), where:
If t is a threshold value in threshold_dim then F(t) is in function_values,
F is piecewise linear between each of the t values in threshold_dim.
- This function assumes that:
threshold_dim is a dimension of function_values
coordinates of threshold_dim are increasing.
- Parameters:
function_values (xr.DataArray) – array of function values F(t).
threshold_dim (xr.DataArray) – dimension along which to integrate.
- Returns:
Integral values and threshold_dim collapsed:
Returns value of the integral with threshold_dim collapsed and other dimensions preserved.
Returns NaN if there are less than two non-NaN function_values.
- Return type:
xr.DataArray
- scores.processing.cdf.add_thresholds(cdf, threshold_dim, new_thresholds, fill_method, *, min_nonnan=2)#
Takes a CDF data array with dimension threshold_dim and adds values from new_thresholds.
The CDF is then filled to replace any NaN values. The array cdf requires at least 2 non-NaN values along threshold_dim.
- Parameters:
cdf (xr.DataArray) – array of CDF values.
threshold_dim (str) – name of the threshold dimension in cdf.
new_thresholds (Iterable[float]) – new thresholds to add to cdf.
fill_method (Literal["linear", "step", "forward", "backward", "none"]) – one of “linear”, “step”, “forward” or “backward”, as described in fill_cdf. If no filling, set to “none”.
min_nonnan (int) – passed onto fill_cdf for performing filling.
- Returns:
Additional thresholds, and values at those thresholds determined by the specified fill method.
- Return type:
xr.DataArray
- scores.processing.cdf.fill_cdf(cdf, threshold_dim, method, min_nonnan)#
Fills NaNs in a CDF of a real-valued random variable along threshold_dim with appropriate values between 0 and 1.
- Parameters:
cdf (xr.DataArray) – CDF values, where P(Y <= threshold) = cdf_value for each threshold in threshold_dim.
threshold_dim (str) – the threshold dimension in the CDF, along which filling is performed.
method (Literal["linear", "step", "forward", "backward"]) –
one of:
”linear”: use linear interpolation, and if needed also extrapolate linearly. Clip to 0 and 1. Needs at least two non-NaN values for interpolation, so returns NaNs where this condition fails.
”step”: use forward filling then set remaining leading NaNs to 0. Produces a step function CDF (i.e. piecewise constant).
”forward”: use forward filling then fill any remaining leading NaNs with backward filling.
”backward”: use backward filling then fill any remaining trailing NaNs with forward filling.
min_nonnan (int) – the minimum number of non-NaN entries required along threshold_dim for filling to be performed. All CDF values are set to np.nan where this condition fails. min_nonnan must be at least 2 for the “linear” method, and at least 1 for the other methods.
- Returns:
Containing the same values as cdf but with NaNs filled.
- Return type:
xr.DataArray
- Raises:
ValueError – If threshold_dim is not a dimension of cdf.
ValueError – If min_nonnan < 1 when method=”step” or if min_nonnan < 2 when method=”linear”.
ValueError – If method is not “linear”, “step”, “forward” or “backward”.
ValueError – If any non-NaN value of cdf lies outside the unit interval [0,1].
- scores.processing.cdf.decreasing_cdfs(cdf, threshold_dim, tolerance)#
A CDF of a real-valued random variable should be nondecreasing along threshold_dim.
This is sometimes violated due to rounding issues or bad forecast process. decreasing_cdfs checks CDF values decrease beyond specified tolerance; that is, whenever the sum of the incremental decreases exceeds tolerarance.
For example, if the CDF values are [0, 0.4, 0.3, 0.9, 0.88, 1] then the sum of incremental decreases is -0.12. Given a specified positive tolerance, the CDF values decrease beyond tolerance if the sum of incremental decreases < -tolerance.
Intended use is for CDFs with increasing coordinates along threshold_dim dimension, and where either each CDF is always NaN or always non-NaN.
- Parameters:
cdf (xr.DataArray) – data array of CDF values
threshold_dim (str) – threshold dimension, such that P(Y < threshold) = cdf_value.
tolerance (float) – nonnegative tolerance value.
- Returns:
Containing threshold_dim collapsed and values True if and only if the CDF is decreasing outside tolerance. If the CDF consists only of NaNs then the value is False.
- Return type:
xr.DataArray
- Raises:
ValueError – If threshold_dim is not a dimension of cdf.
ValueError – If tolerance is negative.
ValueError – If coordinates are not increasing along threshold_dim.
ValueError – If some, but not all, CDF values in cdf along threshold_dim are NaN.
- scores.processing.cdf.cdf_envelope(cdf, threshold_dim)#
Forecast cumulative distribution functions (CDFs) for real-valued random variables.
CDFs that are reconstructed from known points on the distribution should be nondecreasing with respect to the threshold dimension. However, sometimes this may fail due to rounding or poor forecast process. This function returns the “envelope” of the original CDF, which consists of two bounding CDFs, both of which are nondecreasing.
The following example shows values from an original CDF that has a decreasing subsequence (and so is not a true CDF). The resulting “upper” and “lower” CDFs minimally adjust “original” so that “lower” <= “original” <= “upper”.
“original”: [0, .5, .2, .8, 1]
“upper”: [0, .5, .5, .8, 1]
“lower”: [0, .2, .2, .8, 1]
This function does not perform checks that 0 <= cdf <= 1.
- Parameters:
cdf (xr.DataArray) – forecast CDF with thresholds in the thresholds_dim.
threshold_dim (str) – dimension in fcst_cdf that contains the threshold ordinates.
- Returns:
An xarray DataArray consisting of three CDF arrays indexed along the “cdf_type” dimension with the following indices:
”original”: same data as cdf.
”upper”: minimally adjusted “original” CDF that is nondecreasing and satisfies “upper” >= “original”.
”lower”: minimally adjusted “original” CDF that is nondecreasing and satisfies “lower” <= “original”.
NaN values in cdf are maintained in “original”, “upper” and “lower”.
- Return type:
xr.DataArray
- Raises:
ValueError – If threshold_dim is not a dimension of cdf.
scores.plotdata#
- scores.plotdata.murphy_score(fcst, obs, thetas, *, functional, alpha, huber_a=None, decomposition=False, reduce_dims=None, preserve_dims=None)#
Returns the mean elementary score (Ehm et. al. 2016), also known as Murphy score, evaluated at decision thresholds specified by thetas. Optionally returns a decomposition of the score in terms of penalties for over- and under-forecasting.
Select
functional="quantile"andalpha=0.5for the median functional. Selectfunctional="expectile"andalpha=0.5for the mean (i.e., expectation) functional.Consider using
murphy_thetas()to generate thetas. If utilising dask, you may want to storethetasas a netCDF on disk and pass it in as an xarray object. Otherwise, very large objects may be created whenfcst,obsandthetasare broadcast together.- Parameters:
fcst (DataArray) – Forecast numerical values.
obs (DataArray) – Observed numerical values.
thetas (Sequence[float] | DataArray) – Theta thresholds.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
alpha (float) – Risk parameter (i.e. quantile or expectile level) for the functional. Must be between 0 and 1.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
decomposition (bool) – True to return penalty values due to under- and over-fcst as well as the total score, False to return total score only.
reduce_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to reduce when calculating the Murphy score. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Iterable[Hashable] | None) – Optionally specify which dimensions to preserve when calculating the Murphy score. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the errors will be the FIRM score at each point (i.e. single-value comparison against observed), and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.
- Returns:
An xr.Dataset with dimensions based on the
preserve_dimsorreduce_dimsarg as well as a “theta” dimension with values matchingthetasinput.If
decompositionis False, the dataset’s variables will contain 1 element “total”.If
decompositionis True, in addition to “total”, it will have “underforecast” and “overforecast” data_vars.- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
alphais not strictly between 0 and 1.ValueError – If
huber_ais not strictly greater than 0.
- Return type:
Dataset
References
- For mean elementary score definitions, see:
Theorem 1 of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Theorem 5.3 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.plotdata.murphy_thetas(forecasts, obs, functional, *, huber_a=None, left_limit_delta=None)#
Return the decision thresholds (theta values) at which to evaluate elementary scores for the construction of Murphy diagrams.
This function may generate a very large number of theta thresholds if the forecast and obs data arrays are large and their values have high precision, which may lead to long computational times for Murphy diagrams. To reduce the number of thetas, users can first round forecast and obs values to an appropriate resolution.
- Parameters:
forecasts (list[DataArray]) – Forecast values, one array per source.
obs (DataArray) – Observed values.
functional (Literal['quantile', 'huber', 'expectile']) – The type of elementary scoring function, one of “quantile”, “huber”, or “expectile”.
huber_a (float | None) – Huber transition parameter, used for “huber” functional only. Must be strictly greater than 0.
left_limit_delta (float | None) – Approximation of the left hand limit, used for “huber” and “expectile” functionals. Must be greater than or equal to 0. None will be treated as 0. Ideally, left_limit_delta should be small relative to the fcst and obs precision, and not greater than that precision.
- Returns:
theta thresholds to be used to compute murphy scores.
- Return type:
List[float]
- Raises:
ValueError – If
functionalis not one of the expected functions.ValueError – If
huber_ais not strictly greater than 0.ValueError – If
left_limit_deltais not greater than or equal to 0.
Notes
For theta values at which to evaluate elementary scores, see
Corollary 2 (p.521) of Ehm, W., Gneiting, T., Jordan, A., & Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 505–562. https://doi.org/10.1111/rssb.12154
Corollary 5.6 of Taggart, R. J. (2022). Point forecasting and forecast evaluation with generalized Huber loss. Electronic Journal of Statistics, 16(1), 201-231. https://doi.org/10.1214/21-EJS1957
- scores.plotdata.qq(fcst, obs, *, quantiles, interpolation_method='linear', data_source_dim='data_source', reduce_dims=None, preserve_dims=None)#
Calculate the quantiles of empirical forecast and observation data for a quantile-quantile (Q-Q) plot.
Forecast and observation data are broadcast and NaN values are matched before calculating the quantiles. This function calculates the quantiles for the user to plot using their preferred plotting library.
Currently this function is implemented for working with raw forecast and observation data rather than a theoretical probability distribution.
- Parameters:
fcst (DataArray | Dataset) – Forecast data.
obs (DataArray | Dataset) – Observation data.
quantiles (Iterable[float]) – Quantiles to compute, e.g., [0.1, 0.5, 0.9].
interpolation_method (str) – Interpolation method for estimating the quantile value, default is ‘linear’. Valid options include ‘inverted_cdf’, ‘averaged_inverted_cdf’, ‘closest_observation’, ‘interpolated_inverted_cdf’, ‘hazen’, ‘weibull’, ‘linear’, ‘median_unbiased’, ‘normal_unbiased’, ‘lower’, ‘higher’, ‘midpoint’, and ‘nearest’. See
numpy.quantile()for more information on interpolation method options.data_source_dim (str) – Name of the new dimension to be created for the data source, default is ‘data_source’. This dimension will contain two values: ‘fcst’ and ‘obs’, representing the forecast and observation data, respectively.
reduce_dims (Iterable[Hashable] | None) – Dimensions to reduce over when calculating quantiles.
preserve_dims (Iterable[Hashable] | None) – Dimensions to preserve when calculating quantiles.
- Returns:
An xarray object with a new dimension called “data_source”.
- Raises:
ValueError – If the interpolation method is invalid.
ValueError – if quantiles are not between 0 and 1.
ValueError – If a dimension in the input data has the same name as data_source_dim
ValueError – If fcst and obs are Datasets with different data variables.
ValueError – If a user tries to preserve all dimensions.
TypeError – If fcst and obs are not both xarray DataArrays or Datasets.
- Return type:
DataArray | Dataset
References
Déqué, M. (2011). Deterministic forecasts of continuous variables. In I. T. Jolliffe & D. B. Stephenson (Eds.), Forecast verification: A practitioner’s guide in atmospheric science (2nd ed., pp. 77–94). Wiley. https://doi.org/10.1002/9781119960003.ch5
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.plotdata import qq >>> fcst = xr.DataArray(np.random.rand(100), dims='time') >>> obs = xr.DataArray(np.random.rand(100), dims='time') >>> result = qq(fcst, obs, quantiles=[0.1, 0.5, 0.9])
- scores.plotdata.roc(fcst, obs, thresholds='auto', *, reduce_dims=None, preserve_dims=None, weights=None, check_args=True)#
Calculates data required for plotting a Receiver (Relative) Operating Characteristic (ROC) curve, including the area under the curve (AUC). The ROC curve is used as a way to measure the discrimination ability of a particular forecast.
The AUC is the probability that the forecast probability of a random event is higher than the forecast probability of a random non-event.
- Parameters:
fcst (DataArray) – An array of probabilistic forecasts for a binary event in the range [0, 1].
obs (DataArray) – An array of binary values where 1 is an event and 0 is a non-event.
thresholds (str | Iterable[float]) – By default, when
thresholds = "auto", the ROC thresholds are automatically generated. Otherwise, you can supply an iterable of floats with monotonic increasing values between 0 and 1, which are the thresholds at and above which to convert the probabilistic forecast to a value of 1 (an ‘event’). If there are many unique forecast values, this can lead to a very large number of automatically generated thresholds. If performance is slow, consider supplying thresholds manually as an iterable of floats. Np.inf is added automatically to the end of the thresholds to ensure that the full ROC curve is produced. Similarly, if 0 is not included, it will be added automatically to ensure the full ROC curve is produced.reduce_dims (Sequence[str] | None) – Optionally specify which dimensions to reduce when calculating the ROC curve data. All other dimensions will be preserved. As a special case, ‘all’ will allow all dimensions to be reduced. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.preserve_dims (Sequence[str] | None) – Optionally specify which dimensions to preserve when calculating ROC curve data. All other dimensions will be reduced. As a special case, ‘all’ will allow all dimensions to be preserved. In this case, the result will be in the same shape/dimensionality as the forecast, and the values will be the ROC curve at each point (i.e. single-value comparison against observed) for each threshold, and the forecast and observed dimensions must match precisely. Only one of
reduce_dimsandpreserve_dimscan be supplied. The default behaviour if neither are supplied is to reduce all dims.weights (DataArray | None) – An array of weights to apply to the score (e.g., weighting a grid by latitude). If None, no weights are applied. If provided, the weights must be broadcastable to the data dimensions and must not contain negative or NaN values. If appropriate, users can choose to replace NaN values in weights by calling
weights.fillna(0). The weighting approach followsxarray.computation.weighted.DataArrayWeighted. See the scores weighting tutorial for more information on how to use weights.check_args (bool) – Checks if
obsdata only contains values in the set {0, 1, np.nan}. You may want to skip this check if you are sure about your input data and want to improve the performance when working with dask.
- Returns:
‘POD’ (the probability of detection)
’POFD’ (the probability of false detection)
’AUC’ (the area under the ROC curve)
PODandPOFDhave dimensionsdims+ ‘threshold’, whileAUChas dimensionsdims.- Return type:
An xarray.Dataset with data variables
- Raises:
ValueError – if
fcstcontains values outside of the range [0, 1].ValueError – if
obscontains non-nan values not in the set {0, 1}.ValueError – if ‘threshold’ is a dimension in
fcst.ValueError – if values in thresholds are not monotonic increasing or are outside the range [0, 1].
ValueError – if
thresholdsis a string that is not “auto”.ValueError – if
thresholdsis an empty iterable.
Warning
If the number of automatically generated thresholds is very large (>1000), a warning is raised suggesting that the user supply thresholds manually as an iterable of floats if performance is slow.
Notes
If
thresholdsis an iterable of floats, the probabilisticfcstis converted to a deterministic forecast for each threshold inthresholds. If a value infcstis greater than or equal to the threshold, then it is converted into a ‘forecast event’ (fcst = 1), and a ‘forecast non-event’ (fcst = 0) otherwise. The probability of detection (POD) and probability of false detection (POFD) are calculated for the converted forecast. From the POD and POFD data, the area under the ROC curve is calculated. An additional threshold ofnp.infis added to the end ofthresholdsso that it always has a value when POD=0 and POFD=0.If
threshold="auto"which is the default, then the thresholds used are the ordered, unique forecast values.Ideally concave ROC curves should be generated rather than traditional ROC curves.
Example
>>> import xarray as xr >>> import numpy as np >>> from scores.probability import roc_curve_data >>> fcst = xr.DataArray(np.random.rand(3, 4), dims=["time", "location"]) >>> obs = xr.DataArray(np.random.randint(0, 2, size=(3, 4)), dims=["time", "location"]) >>> result = roc_curve_data(fcst, obs)
scores.pandas#
- scores.pandas.continuous.mse(fcst, obs, *, is_angular=False)#
Calculates the mean squared error from forecast and observed data.
A detailed explanation is on https://en.wikipedia.org/wiki/Mean_squared_error
\[\frac{1}{n} \sum_{i=1}^n (\text{forecast}_i - \text{observed}_i)^2\]Notes
Dimensional reduction is not supported for pandas and the user should convert their data to xarray to formulate the call to the base metric, scores.continuous.mse.
- Parameters:
fcst (Type[Series]) – Forecast or predicted variables in pandas.
obs (Type[Series]) – Observed variables in pandas.
is_angular (bool) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
An object containing a single floating point number representing the mean squared error for the supplied data. All dimensions will be reduced.
- Return type:
pandas.Series
- scores.pandas.continuous.rmse(fcst, obs, *, is_angular=False)#
Calculate the Root Mean Squared Error from xarray or pandas objects.
A detailed explanation is on https://en.wikipedia.org/wiki/Root-mean-square_deviation
\[\sqrt{\frac{1}{n} \sum_{i=1}^n (\text{forecast}_i - \text{observed}_i)^2}\]Notes
Dimensional reduction is not supported for pandas and users wishing this extra functionality should convert their data to xarray to formulate the call to scores.continuous.rmse.
- Parameters:
fcst (Type[Series]) – Forecast or predicted variables in pandas.
obs (Type[Series]) – Observed variables in pandas.
is_angular (bool) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
An object containing a single floating point number representing the root mean squared error for the supplied data. All dimensions will be reduced.
- Return type:
pandas.Series
- scores.pandas.continuous.mae(fcst, obs, *, is_angular=False)#
Calculates the mean absolute error from forecast and observed data.
A detailed explanation is on https://en.wikipedia.org/wiki/Mean_absolute_error
\[\frac{1}{n} \sum_{i=1}^n | \text{forecast}_i - \text{observed}_i |\]Notes
Dimensional reduction is not supported for pandas and the user should convert their data to xarray to formulate the call to scores.continuous.mae.
- Parameters:
fcst (Type[Series]) – Forecast or predicted variables in pandas.
obs (Type[Series]) – Observed variables in pandas.
is_angular (bool) – specifies whether fcst and obs are angular data (e.g. wind direction). If True, a different function is used to calculate the difference between fcst and obs, which accounts for circularity. Angular fcst and obs data should be in degrees rather than radians.
- Returns:
An object containing a single floating point number representing the mean absolute error for the supplied data. All dimensions will be reduced.
- Return type:
pandas.Series