Interactive online version: Binder badge. Download notebook.

Fractions Skill Score (FSS)#

The original paper, and formal definitions can be found here:

Roberts, N. M., & Lean, H. W. (2008). Scale-Selective Verification of Rainfall Accumulations from High-Resolution Forecasts of Convective Events. Monthly Weather Review, 136(1), 78-97. https://doi.org/10.1175/2007MWR2123.1

For an explanation of the FSS, and implementation considerations, see:

Faggian, N., Roux, B., Steinle, P., & Ebert, B. (2015). Fast calculation of the fractions skill score. MAUSAM, 66(3), 457–466. https://doi.org/10.54302/mausam.v66i3.555

FSS for a single 2D field#

FSS is computed over 2D arrays representing the observations & forecasts in the spatial domain. The user has to make sure that the input dimensions correspond to the spatial domain e.g. lat x lon. Generally the computation involves sliding a window over the input field(s) and applying a threshold over the fcst and obs values.

The resulting binary field is summed up to represent the populace (number of ones/”true” values in the window).

The resulting 2-D field of rolling sums represents “Integral Image” of the respective forecast and obeservation fields, which is then aggregated over all the sliding windows to compute the fractions skill score.

The FSS is then roughly defined as:

\[fss = 1 - \frac{\sum_{w}(p_o - p_f)^2}{\sum_{w}p_o^2 + \sum_{w}p_f^2}\]

where,

  • \(p_o\) is the observation populace > threshold, in one window

  • \(p_f\) is the forecast populace > threshold, in one window

  • \(\sum_{w}\) is the sum over all windows

The implementation details are beyond the scope of this tutorial please refer to, Fast calculation of the Fractions Skill Score for more information.

In summary, computation of a single field requires the following parameters:

  • forecast 2-D field (in spatial domain)

  • observations 2-D field (in spatial domain)

  • window size (width x height): The window size of the sliding window

  • threshold: To compare the input fields against to generate a binary field

  • compute method: (optional) currently only numpy is supported

1. Setup#

First let’s create some random data for our forecast and observation fields. Let’s also try out a few scenarios:

scenario 1: obs distribution = fcst distribution = N(0, 1)
scenario 2: fcst distribution biased = N(1, 1)
scenario 3: fcst distribution variant = N(0, 2)

where N(mu, sigma) = normal distribution with mean = mu and standard deviation = sigma

Note: in practice, some fields like rainfall cannot be negative. The data is synthesized, purely to show the steps required to compute the score.

[1]:
import numpy as np

# specify input spatial dimensions
num_cols = 600
num_rows = 400

# set seed for reproducibility
np.random.seed(42)

# generate random fields
obs = np.random.normal(loc=0.0, scale=1.0, size = (num_rows, num_cols))
fcst_1 = np.random.normal(loc=0.0, scale=1.0, size = (num_rows, num_cols))
fcst_2 = np.random.normal(loc=1.0, scale=1.0, size = (num_rows, num_cols))
fcst_3 = np.random.normal(loc=0.0, scale=2.0, size = (num_rows, num_cols))

# print out the different scenarios
_summarize = lambda x, field: print(
    "{: >20}: shape={}, mean={:.2f}, stddev={:.2f}".format(
    field, x.shape, np.mean(x), np.std(x)
))
_summarize(obs, "observations")
_summarize(fcst_1, "forecast scenario 1")
_summarize(fcst_2, "forecast scenario 2")
_summarize(fcst_3, "forecast scenario 3")

        observations: shape=(400, 600), mean=0.00, stddev=1.00
 forecast scenario 1: shape=(400, 600), mean=-0.00, stddev=1.00
 forecast scenario 2: shape=(400, 600), mean=1.00, stddev=1.00
 forecast scenario 3: shape=(400, 600), mean=-0.01, stddev=2.00

2. Define inputs#

We need to now specify the threshold, window size and compute method. For now, lets choose a single window, and threshold. While the current fss method doesn’t allow for more than 1 threshold and window definition per call, we’ll see how calculate multiple thresholds/windows in a later step.

[2]:
window_size = (100, 100)  # height * width or row size * col size
threshold = 0.5  # arbitrarily chosen

# Note: the compute method is a specialized argument, it doesn't need to be specified but is shown here for illustration purposes.
# If you do not specify a compute method it will default to `NUMPY` which is the only method currently supported and is sufficient
# for most purposes.
from scores.fast.fss.typing import FssComputeMethod
compute_method = FssComputeMethod.NUMPY

print("window_size:", window_size)
print("threshold:", threshold)
print("compute_method:", compute_method)
window_size: (100, 100)
threshold: 0.5
compute_method: FssComputeMethod.NUMPY

3. Run FSS#

Since we only have spatial dims it is sufficient to use scores.spatial.fss_2d_single_field for this purpose.

Here are reasons why a user may prefer to use this API:

  • The user can choose their own aggregation method

  • The user wants more choice on how to parallelize the computations.

  • The user wants to compute multiple windows/thresholds without the overhead of xarray.

In practice however, the aggregated approach scores.spatial.fss_2d in the next part can potentially be simpler to use, especially for more common data types used in the field i.e. netCDF/xarray

[3]:
from scores.spatial import fss_2d_single_field

# compile scenarios
scenarios = {
    "scenario 1 (same distribution)": [obs, fcst_1],
    "scenario 2 (biased fcst)": [obs, fcst_2],
    "scenario 3 (variant fcst)": [obs, fcst_3],
}
result = []

# run through each scenario and compute FSS with inputs defined above
for s, v in scenarios.items():
    _obs, _fcst = v
    _fss = fss_2d_single_field(
        _fcst,
        _obs,
        event_threshold=threshold,
        window_size=window_size,
        compute_method=compute_method,
    )
    result.append((s, _fss))

# tabulate results
print(f"{' '*30} | fss score")
print(f"{' '*30} | ---------")
_ = [print("{:<30} | {}".format(s, v)) for (s, v) in result]
                               | fss score
                               | ---------
scenario 1 (same distribution) | 0.9998031509754236
scenario 2 (biased fcst)       | 0.7436622752127978
scenario 3 (variant fcst)      | 0.9658938686575778

As apparent above, with the same distribution, we get a score close to 1, this is because FSS is invariant to the exact location within the sliding window where the binary fields match, and so only the total count is used. With a biased distribution the score dips a lot, which is expected with a threshold of 0.5 and a bias of 1.0. Whereas for a variant forecast, we still get a reasonable score, this is also expected since the variation isn’t too large and the distributions still overlap quite a bit.

4. Multiple inputs#

Suppose now that we want to collate data for multiple thresholds/windows. There are several ways of doing this, including vectorization. The following will show one way of doing it that, while more verbose, would hopefully help decompose the operations required to create the final accumulated dataset.

Now that we understand how the argument mapping works, let’s re-create the mapping and run the fss, we’ll also store the results in a W x T x S array before converting it to xarray for displaying.

W x T x S where,
W = number of windows
T = number of thresholds
S = number of scenarios
[4]:
import xarray as xr
import numpy as np

# as before
window_sizes = np.linspace((20,20), (100,100), 5, dtype=(int,int))
thresholds = np.linspace(0.0, 1.0, 5, dtype=float)
input_scenarios = [[obs, fcst_1], [obs, fcst_2], [obs, fcst_3]]
compute_method = FssComputeMethod.NUMPY

# create output array
W = len(window_sizes)
T = len(thresholds)
S = len(input_scenarios)
fss_out = np.zeros((W, T, S))

# lets iterate over the indices, setting the `multi_index` and `writeonly` flags
with np.nditer(fss_out, flags=['multi_index'], op_flags=['writeonly']) as it:
    for _fss in it:
        # gather argument indices
        window_idx, threshold_idx, input_idx = it.multi_index
        _window_size = window_sizes[window_idx]
        _threshold = thresholds[threshold_idx]
        _obs, _fcst = input_scenarios[input_idx]
        # compute the fss for each combination of arguments
        _fss[...] = fss_2d_single_field(
            _fcst,
            _obs,
            event_threshold=_threshold,
            window_size=_window_size,
            compute_method=compute_method
        )

# construct output xarray with results
da = xr.DataArray(
    data=fss_out,
    dims=["window_size", "threshold", "scenario"],
    coords=dict(
        window_size=[str(x) for x in window_sizes],
        threshold=[str(x) for x in thresholds],
        scenario=range(len(input_scenarios))
    ),
    attrs=dict(
        description="Fractions skill score",
    ),
)
da
[4]:
<xarray.DataArray (window_size: 5, threshold: 5, scenario: 3)> Size: 600B
array([[[0.99750859, 0.87817287, 0.9975254 ],
        [0.99610935, 0.8174165 , 0.99049278],
        [0.99434647, 0.74372786, 0.96275916],
        [0.99147751, 0.66198854, 0.90506356],
        [0.98687133, 0.57380178, 0.80923115]],

       [[0.99937889, 0.87861151, 0.99939787],
        [0.99896402, 0.81789448, 0.99288328],
        [0.9984607 , 0.74403561, 0.96545096],
        [0.99769463, 0.66265241, 0.90811018],
        [0.9967834 , 0.57450354, 0.81262522]],

       [[0.99973932, 0.87865374, 0.99974328],
        [0.99955596, 0.8179917 , 0.993333  ],
        [0.99935278, 0.74368841, 0.96571281],
        [0.99896659, 0.66250326, 0.90828967],
        [0.99866979, 0.57469957, 0.81327941]],

       [[0.99985211, 0.87883698, 0.99986294],
        [0.99975723, 0.81818986, 0.99351877],
        [0.99966926, 0.74359123, 0.96582882],
        [0.99942888, 0.66249762, 0.90833126],
        [0.99928146, 0.57517972, 0.81375657]],

       [[0.99990219, 0.87901309, 0.99991892],
        [0.99984583, 0.8184777 , 0.99364081],
        [0.99980315, 0.74366228, 0.96589387],
        [0.99964815, 0.66239176, 0.90830291],
        [0.99954721, 0.57541375, 0.81392659]]])
Coordinates:
  * window_size  (window_size) <U9 180B '[20 20]' '[40 40]' ... '[100 100]'
  * threshold    (threshold) <U4 80B '0.0' '0.25' '0.5' '0.75' '1.0'
  * scenario     (scenario) int64 24B 0 1 2
Attributes:
    description:  Fractions skill score

Aggregated FSS#

The aggregated FSS groups a list of 2-D spatial fields from the input data array and applies the single 2D FSS in the previous section over each field. It then accumulates the decomposed scores into the final score. A description of the algorithm is provided at the bottom of this tutorial.

1. Setup#

Similar to the previous section lets start by setting up our inputs. This time we will include the scenarios within the input source itself. Let’s define our inputs:

Fcst
----
Dims: T x L x N x M

Obs
---
Dims: T x N x M

where,
  T = time_idx [0..24]
  L = lead_time_idx [0..6]
  N = lat_idx [0..400]
  M = lon_idx [0..600]

We will also assume that the variance and bias increase as the lead time increases

[5]:
import numpy as np
import xarray as xr

# lets start by defining a function to generate a T x N x M array since this will be commonly used
T = 24
L = 6
N = 400
M = 600
# set seed for reproducibility
np.random.seed(42)
def _generate_input(dims, mu, sigma):
    return np.random.normal(loc=mu, scale=sigma, size = dims)

# let's generate obs using the standard normal distribution N(0,1) for a T x N x M field
obs = _generate_input((T, N, M), 0.0, 1.0)

# now for the fcst we'll need to generate and stack the leadtime arrays
# since we only have 6 lead times we can be a bit lazy and use a for loop
# we will also increase the bias and variance as we increase the leadtime
var_inc = 0.5
bias_inc = 0.25
var, bias = 1.0, 0.0
fcst = []
for i in range(L):
    var += var_inc
    bias += bias_inc
    fcst.append(
        _generate_input((T, N, M), bias, var)
    )
fcst = np.stack(fcst)

# we now have the arrays in raw numpy format, we'll need some annotations
# lets convert it to xarray and check that our final dimensions match
da_obs = xr.DataArray(
    data=obs,
    dims=["time_idx", "lat_idx", "lon_idx"],
    coords=dict(
        time_idx=range(T),
        lat_idx=range(N),
        lon_idx=range(M),
    ),
    attrs=dict(
        description="observations",
    ),
)

da_fcst = xr.DataArray(
    data=fcst,
    dims=["lead_time_idx", "time_idx", "lat_idx", "lon_idx"],
    coords=dict(
        lead_time_idx=range(L),
        time_idx=range(T),
        lat_idx=range(N),
        lon_idx=range(M),
    ),
    attrs=dict(
        description="forecast",
    ),
)
[6]:
# check fcst coordinates
da_fcst.coords
[6]:
Coordinates:
  * lead_time_idx  (lead_time_idx) int64 48B 0 1 2 3 4 5
  * time_idx       (time_idx) int64 192B 0 1 2 3 4 5 6 ... 17 18 19 20 21 22 23
  * lat_idx        (lat_idx) int64 3kB 0 1 2 3 4 5 6 ... 394 395 396 397 398 399
  * lon_idx        (lon_idx) int64 5kB 0 1 2 3 4 5 6 ... 594 595 596 597 598 599
[7]:
# check obs coordinates
da_obs.coords
[7]:
Coordinates:
  * time_idx  (time_idx) int64 192B 0 1 2 3 4 5 6 7 ... 16 17 18 19 20 21 22 23
  * lat_idx   (lat_idx) int64 3kB 0 1 2 3 4 5 6 ... 393 394 395 396 397 398 399
  * lon_idx   (lon_idx) int64 5kB 0 1 2 3 4 5 6 ... 593 594 595 596 597 598 599

2. Run Aggregate FSS#

We choose accumulation dimensions and run FSS. Here we want to preserve the lead_time, and calculate the fss reduced over space (lat, lon) followed by time.

Here we use scores.spatial.fss_2d which automatically tries to compute the decomposed fss score for each dimension reduced over space using scores.spatial.fss_2d_single_field similar to the previous section.

Additionally it can take in preserve_dims or (mutually exclusive) reduce_dims function arguments to determine which dimensions to aggregate over.

[8]:
from scores.spatial import fss_2d

# As before we currently only support a single threshold/window argument.
threshold = 0.5
window_size = (100, 100)


# since our input arrays already encapsulate the various scenarios in the time/lead time dimensions,
# we just need to provide the inputs

# IMPORTANT: Note that we need to provide the names of the spatial_dims explicitly, unlike in the previous
# section, where the input is assumed to be a 2D spatial field, this function doesn't know which dimensions
# are spatial unless you tell it.
fss_out = fss_2d(
    da_fcst,
    da_obs,
    event_threshold=threshold,
    window_size=window_size,
    spatial_dims=["lat_idx", "lon_idx"],
    preserve_dims=["lead_time_idx"]
)

[9]:
fss_out
[9]:
<xarray.DataArray (lead_time_idx: 6)> Size: 48B
array([0.94464596, 0.89355836, 0.86178198, 0.84077059, 0.82559444,
       0.81435967])
Coordinates:
  * lead_time_idx  (lead_time_idx) int64 48B 0 1 2 3 4 5

Result: We should now see a dataset of FSS values with only lead_time_idx as the preserved dimension. If we look at the data, the FSS decreases as the leadtime increases, because we made our simulated data distribution shift away from the obs as the lead time increased. So this roughly looks like what we’d expect.

Exercise - multiple windows/thresholds: An exercise is left to the reader to implement a similar method to support multiple input args for fss_2d. See: “4. Multiple Inputs” in the previous section for an idea. Note that this version would be slightly trickier since the result of fss_2d is not necessarily a scalar, however, there are still many approaches to solving this.

2a. Binary input#

We can also run FSS with binary inputs. In this part we’ll repeat the above computation, except we’ll use a pre-discretized input, e.g. using scores.processing.discretise.comparative_discretise and passing the result into scores.spatial.fss_2d_binary. Note that the default threshold operator for fss_2d is np.greater. So for a fair comparison, we’ll have to use the “>” mode using comparative_discretise.

[10]:
from scores.spatial import fss_2d_binary
from scores.processing.discretise import comparative_discretise

threshold = 0.5
window_size = (100, 100)

# pre-discretise the inputs
da_fcst_binary = comparative_discretise(da_fcst, threshold, ">")
da_obs_binary = comparative_discretise(da_obs, threshold, ">")

fss_out = fss_2d_binary(
    da_fcst_binary,
    da_obs_binary,
    window_size=window_size,
    spatial_dims=["lat_idx", "lon_idx"],
    preserve_dims=["lead_time_idx"],
    check_boolean=False,
)

fss_out
[10]:
<xarray.DataArray (lead_time_idx: 6)> Size: 48B
array([0.94464596, 0.89355836, 0.86178198, 0.84077059, 0.82559444,
       0.81435967])
Coordinates:
  * lead_time_idx  (lead_time_idx) int64 48B 0 1 2 3 4 5

Result: We see that this is the same result as using fss_2d

2b. Zero-padding#

By default fss_2d does a sliding window on “full” windows (i.e. all grid points in the window are interior to the input field). This is different to a convolutional kernel which slides the window from outside the domain, and hence would require zero-padding to avoid boundary conditions. This would result in “partial” windows.

Most Fourier based sliding window algorithms (e.g. Fast Fourier Transform) provide some sort of option for padding. Without going into too much detail, the algorithm used here does not necessitate this as it doesn’t deal with the frequency domain. Regardless, there are various reasons to include zero-padding in the computation. The option is hence given with the optional zero_padding argument set to True as shown below:

[11]:
from scores.spatial import fss_2d

threshold = 0.5
window_size = (100, 100)

fss_out = fss_2d(
    da_fcst,
    da_obs,
    event_threshold=threshold,
    window_size=window_size,
    spatial_dims=["lat_idx", "lon_idx"],
    preserve_dims=["lead_time_idx"],
    zero_padding=True,  # set zero-padding to true
)

fss_out

[11]:
<xarray.DataArray (lead_time_idx: 6)> Size: 48B
array([0.94448153, 0.89335672, 0.86154282, 0.84051093, 0.82539875,
       0.81418548])
Coordinates:
  * lead_time_idx  (lead_time_idx) int64 48B 0 1 2 3 4 5

Result: Because zero-padding uses partial windows in its aggregation, we get slightly different results. However, on quick inspection its easy to see that they still match to a few decimal places. Note however, that the algorithms do tend to differ more for smaller input fields.

Things to try next#

  • Graph FSS over multiple thresholds and window sizes.

  • Derive theoretical convergence of FSS, for a given threshold. Verify using simulations.

  • Apply FSS over other distributions (e.g. uniform/beta). Validate the results.

References#

  1. Roberts, N.M. and H.W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78-97.

  2. Faggian, Nathan & Roux, Belinda & Steinle, Peter & Ebert, Elizabeth. (2015). Fast calculation of the Fractions Skill Score. Mausam. 66. 457-466. 10.54302/mausam.v66i3.555.

  3. Summed-area table

Appendix#

Accumulated FSS Algorithm

Let's assume our data set is of the form lat x lon x time x lead_time => N x M x T x L
where,
    lat x lon = spatial dims

1. [Map] Map the input field into a list of fields to be aggregated over. i.e.

        lat x lon x time x lead_time -> [lat x lon] x lead_time,
        where,
            [lat x lon] is a list of T spatial fields.
            keep_dim = lead_time; spatial_dims = lat, lon; collapse_dim = time

2. [Compute] For each lead time compute the decomposed fss score for [lat x lon] (len = T).

        The decomposed FSS score is a tuple containing the components required
        to formulate the inal score:

        (obs_sum / T, fcst_sum / T, (obs_sum - fcst_sum) / T)

        The division by T is optional, but good to have for larger datasets to
        avoid integer overflow issues when aggregating.

3. [Reduce] Sum the decomposed score across the accumulated dimension.

        fss <- [0; 1 x L]
        for each leadtime, lt {
            obs_sum <- sum_T(fss_decomposed[lt][0])
            fcst_sum <- sum_T(fss_decomposed[lt][1])
            diff_sum <- sum_T(fss_decomposed[lt][2])
            fss[lt] <- dff_sum / (obs_sum + fcst_sum)
        }
        return fss :: size := 1 x L


The reason why we need to keep track of the decomposed score instead of simply
aggregating the final score is because:

1/x + 1/y != 1 / (x+y)
[ ]: