Interactive online version:

Nash-Sutcliffe Efficiency (NSE)#

In this tutorial, we will calculate the Nash–Sutcliffe model efficiency coefficient (NSE) using the function scores.continuous.nse.

NSE is a widely used metric in hydrology and other fields to evaluate the performance of a model by comparing its predictions to observed data.

Definition#

From Nash and Sutcliffe (1970), paraphrased to the context in scores, NSE is defined as follows:

\[\begin{split}\begin{align*} \large{\text{NSE}} &= 1 - \large\frac{\large\sum_{i=1}^{n}(f_i - o_i)^2}{\large\sum_{i=1}^{n}(o_i - \bar{o})^2} \\ &= 1 - \large\frac{\text{MSE}}{\sigma_{o}^2} \\ \end{align*}\end{split}\]

Where:

\(f_i\) is the forecast or predicted value
\(o_i\) is the observed value
\(\bar{o}\) is the mean of the observed values
\(n\) is the number of data points
\(\sigma_{\large{o}}^2\) is the un-weighted observation variance
MSE is the mean-squared error, equivalent to the scores function: scores.continuous.mse

Scores also supports the use of weights. These can be used to scale both the individual forecast error in the numerator and the deviation from the observation mean in the denominator. This is also known as the weighted-NSE or wNSE; for an application of this, refer to Hundecha and Bárdossy (2004).

\[\begin{split}\begin{align*} \large{\text{wNSE}} &= 1 - \large\frac{\large\sum_{i=1}^{n}w_i.(f_i - o_i)^2}{\large\sum_{i=1}^{n}w_i.(o_i - \bar{o})^2} \\ &= 1 - \large\frac{\text{MWSE}}{\sigma_{o_w}^2} \\ \end{align*}\end{split}\]

Where:

\(\vec{w} = (w_1, w_2,..., w_n)\) are the weights for each index \(i\)
The weights must be non-negative: \(\forall i \ : \ 0 \leq w_i\ , \ \text{and there exists at least one}\ \ i \ : \ 0 \lt w_i\) (specifiable using weights option)
\(\sigma_{\large{o_w}}^2\) is the weighted obs variance
\(\small\text{MWSE}\) is the mean of the weighted square errors - again this is also equivilent to scores.continuous.mse, if using the weights argument

caution: \(\small\text{MWSE}\) (scores) should not be confused with \(\small\text{WMSE}\) (weighted mean square errror) where weights are normalised (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html):

\(\small\text{WMSE}\) is a “weighted mean” of “unweighted” error terms.

On the other hand, \(\small\text{MWSE}\) is a “unweighted mean” of “weighted” error terms.

The distinction is subtle, essentially: \(\small\text{WMSE}\) treats 0 weights as data to exclude; additionally it requires that the weights sum to 1.

Whereas \(\small\text{MWSE}\) (what scores does), treats 0 weights as data that is zero forced and has no constraints on the upper bound of the weights (at the time of writing).

Incidentally \(\small\text{MWSE}\) is also what is needed to compute a “weighted” NSE (\(\small\text{wNSE}\)).

Further, while \(i\) above is shown as a integer index from 0 to n, its definition can be abstracted to any multi-index i.e.

\(\vec{i} \in \{ \left(\ i_0,\ i_1,\ ...\ \right) :\ 0\leq i_0 \lt N_0;\ 0\leq i_1 \lt N_1; \ ...\ \}\)

If one wishes to accumulate the NSE scores over multiple indices, the reduce_dims and preserve_dims arguments can be used (similar to other scores).

Interpretation#

In hydrological modeling, the Nash–Sutcliffe Efficiency (NSE) is essential for assessing model performance. An NSE of 1 indicates perfect prediction, while 0 suggests the model performs as well as predicting the mean of the data. Negative values imply the observed mean is a better predictor. NSE values closer to 1 denote superior predictive ability. In regression analyses, NSE parallels the coefficient of determination (R²) as a measure of model fit, but it ranges from -∞ to 1, rather than 0 to 1.

Note:

The image (co-domain) of \(R^2 : (x, x_\text{fit}) \rightarrow \text{score}\) does not necessarily need to be constrained to \([0, 1]\). It’s just that a linear least squares optimisation makes it impossible for a fit to do worse than the mean.

Analysis using NSE explicitly doesn’t pose such restrictions. Its more akin to the signal to noise ratio in this sense (Duc & Sawada, 2023). In fact,

\(\text{SNR} \Large = \frac{E[{O^2}]}{E[\xi_\text{model}^2]} = \frac{\sigma_{o}^2}{\text{MSE}} = \frac{1}{1 - NSE}\)

Where, \(O\) is the observed process (“signal”), and \(\xi_\text{model}\) is the error of the modelled predictions/forecasts/simulations (“noise”).

References#

Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I — A discussion of principles. Journal of Hydrology, 10(3), 282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Hundecha, Y., & Bárdossy, A. (2004). Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. Journal of Hydrology, 292(1–4), 281–295. https://doi.org/10.1016/j.jhydrol.2004.01.002
Duc, L., & Sawada, Y. (2023). A signal-processing-based interpretation of the Nash–Sutcliffe efficiency. Hydrology and Earth System Sciences, 27(9), 1827–1839. https://doi.org/10.5194/hess-27-1827-2023

Using the `nse` Function#

Let’s start by importing the nse function from our module and exploring its usage with different types of input data.

[1]:

from scores.continuous import nse
import numpy as np
import xarray as xr
import pandas as pd

np.random.seed(0)  # set the seed to make notebook reproducible

Example 1: Xarray DataArray#

[2]:

fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
nse_xr = nse(fcst_xr, obs_xr)
print("NSE for Xarray DataArray:", nse_xr)

NSE for Xarray DataArray: <xarray.DataArray 'NSE' ()> Size: 8B
array(0.5)

Example 2: Large Xarray DataArrays#

[3]:

fcst_large = xr.DataArray(
    data=np.random.random_sample((1000, 1000)) * 360,
    dims=["space", "time"],
    coords=[np.arange(0, 1000), np.arange(0, 1000)],
)
obs_large = xr.DataArray(
    data=np.random.random_sample((1000, 1000)) * 360,
    dims=["space", "time"],
    coords=[np.arange(0, 1000), np.arange(0, 1000)],
)
nse_large = nse(fcst_large, obs_large)
print("NSE for large Xarray DataArrays:", nse_large)

NSE for large Xarray DataArrays: <xarray.DataArray 'NSE' ()> Size: 8B
array(-0.9995806)

Example 3: Angular and Array#

[4]:

fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
nse_anular = nse(fcst_xr, obs_xr, is_angular=True)
print("NSE for angular types:", nse_anular)

NSE for angular types: <xarray.DataArray 'NSE' ()> Size: 8B
array(0.5)

Example 4: Weight and Array#

[5]:

fcst_xr = xr.DataArray([3, 4, 5, 6, 7])
obs_xr = xr.DataArray([2, 3, 4, 5, 6])
weights = xr.DataArray(np.array([1, 2, 3, 2, 1]))
nse_weights = nse(fcst_xr, obs_xr, weights=weights)
print("NSE with weights types:", nse_weights)

NSE with weights types: <xarray.DataArray 'NSE' ()> Size: 8B
array(0.25)

Example 5: 2D Array: Time and Station#

[6]:

def create_synthetic_2d_data():
    # Define dimensions
    time = pd.date_range("2024-01-01", periods=5, freq="D")
    stations = ["Station1", "Station2", "Station3"]

    # Use specified forecast and observed values
    forecast_data = np.array(
        [[3, 4, 5, 6, 7], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7]]
    ).T  # Transpose to align with dimensions (time, station)
    observed_data = np.array(
        [[2, 3, 4, 5, 6], [2, 3, 4, 5, 6], [2, 3, 4, 5, 6]]
    ).T  # Transpose to align with dimensions (time, station)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data, coords={"time": time, "station": stations}, dims=["time", "station"], name="forecast"
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data, coords={"time": time, "station": stations}, dims=["time", "station"], name="observed"
    )

    return forecast_da, observed_da

[7]:

# Create synthetic forecast and observed DataArrays
fcst_xr, obs_xr = create_synthetic_2d_data()

# Calculate the NSE for the test case, reducing over time - giving one value per station
nse_value = nse(fcst_xr, obs_xr, reduce_dims="time")
print("NSE for station:", nse_value)

NSE for station: <xarray.DataArray 'NSE' (station: 3)> Size: 24B
array([0.5, 0.5, 0.5])
Coordinates:
  * station  (station) <U8 96B 'Station1' 'Station2' 'Station3'

Example 7: 3D Array: Time, Station and Lead-time#

[11]:

import numpy as np
import pandas as pd
import xarray as xr


def create_synthetic_deterministic_data():
    # Define dimensions
    time = pd.date_range("2024-01-01", "2024-01-31", freq="D")
    stations = ["Station1", "Station2", "Station3", "Station4", "Station5"]
    lead_times_forecast = np.arange(1, 8)  # Lead times from 1 to 7 for forecast
    lead_times_observed = np.array([1])  # Lead time of 1 day for observed data

    # Generate synthetic hydrograph data
    np.random.seed(0)  # For reproducibility

    # Base sine wave to simulate periodic streamflow variations
    days = len(time)
    base_flow = 150 + 50 * np.sin(2 * np.pi * np.arange(days) / days)  # Mean of 150, amplitude of 50

    # Reshape base_flow to match dimensions (31, 1, 1) for broadcasting
    base_flow = base_flow[:, np.newaxis, np.newaxis]

    # Adding random fluctuations around the base flow for forecast and observed data
    forecast_data = np.clip(base_flow + 20 * np.random.randn(days, len(stations), len(lead_times_forecast)), 0, 300)
    observed_data = np.clip(base_flow + 20 * np.random.randn(days, len(stations), len(lead_times_observed)), 0, 300)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={"time": time, "station": stations, "lead_time": lead_times_forecast},
        dims=["time", "station", "lead_time"],
        name="forecast",
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={"time": time, "station": stations, "lead_time": lead_times_observed},
        dims=["time", "station", "lead_time"],
        name="observed",
    )

    return forecast_da, observed_da

[12]:

# Display the DataArrays
# Create forecast and observed DataArrays
forecast_da, observed_da = create_synthetic_deterministic_data()
print(f"obs shape: {observed_da.sizes}")
print(f"fcst shape: {forecast_da.sizes}")

obs shape: Frozen({'time': 31, 'station': 5, 'lead_time': 1})
fcst shape: Frozen({'time': 31, 'station': 5, 'lead_time': 7})

NSE Calculation along Lead-time#

So far we’ve been only passing a string to reduce_dims, as mentioned in the introduction of this tutorial, the indexers can span multiple dimensions. Suppose we would like to find NSE for each lead time, we simply need to pass reduce_dims=("station", "time") alternatively, preserve_dims="lead_time" should also work. Before doing so, we need to broadcast the two arrays prior to computations. Normally, this will happen automatically - but fill it with NaNs when broadcasted.

[13]:

print("\n\nNaN entries with xarray broadcasting:")
display(sum(np.isnan(xr.broadcast(forecast_da, observed_da)[1].values)))
print("\n\nNaN entries with xarray broadcasting:")
display((forecast_da - observed_da).shape)
print("\n\nNaN entries with `.values` and numpy broadcasting:")
display(sum(np.isnan(np.broadcast_to(observed_da, forecast_da.shape))))
print("\nshape when doing a basic numpy subtraction instead")
display((forecast_da.values - observed_da.values).shape)



NaN entries with xarray broadcasting:

array([[ 0, 31, 31, 31, 31, 31, 31],
       [ 0, 31, 31, 31, 31, 31, 31],
       [ 0, 31, 31, 31, 31, 31, 31],
       [ 0, 31, 31, 31, 31, 31, 31],
       [ 0, 31, 31, 31, 31, 31, 31]])



NaN entries with xarray broadcasting:

(31, 5, 1)



NaN entries with `.values` and numpy broadcasting:

array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])


shape when doing a basic numpy subtraction instead

(31, 5, 7)

Proper Broadcasting#

The reason this happens is because of the strict coordinate defintions. To circumvent this we actually need to use numpy’s broadcast method, so that we are dealing with an attribute-less data-array. Since nse relies on mse for performing any computations under-the-hood, where the latter relies on using xarray for computational API calls rather than numpy. If we do want to mimic numpy’s method we just have to use numpy to call broadcast instead.

Caution:

Using numpy for broadcasting will almost certainly load everything to memory. However, unlike xarray which relies on merge strategies, numpy basically repeats data if the dimensions are compatible.

In any case - broadcasting is not always trivial - especially for large datasets. In particular, nse (and by extension mse) do not guarantee operations are broadcasted between the obs, fcst and optionally weights, if their variables and coordinates/dimensions do not match. Instead, they usually only apply operations to variables and coordinates that do match. This behaviour (inherited from xarray) is intentionally left as the default, as there is no obvious alternative broadcasting option (both broadcasting with nans and broadcasting using vectorisation - or replication - are viable for different scenarios).

[14]:

observed_np_broadcast = np.broadcast_to(observed_da, forecast_da.shape)
observed_da_broadcast = observed_da.broadcast_like(forecast_da)
observed_da_broadcast = observed_da_broadcast.copy(data=observed_np_broadcast)

# to breakdown the above:
# - np.broadcast_to: takes the observed_da and broadcasts it to the forecast_da's shape - it repeats the data if compatible.
# - DataArray.broadcast_like: is a relatively cheap operation that just setsup extra space to match the forecast data array's shape
# - .copy: creates a shallow copy, but it's mostly used for its "data" argument which can be used to update the entries in the newly broadcasted array.
# Now if we look at the shape:
print("\nbroadcasted obs shape:")
display(observed_da_broadcast.sizes)
print("\nCheck index 0 against index 1 have the same repeated values")
display(np.all(observed_da_broadcast[:,:,0] == observed_da_broadcast[:,:,1]).values)
print("\nCheck index 0 against last index (-1) have the same repeated values")
display(np.all(observed_da_broadcast[:,:,0] == observed_da_broadcast[:,:,-1]).values)
print("\nNow we have repeating obs... let's compute NSE\n")

print("\n------------------------------")
print(" NSE broadcasted using xarray")
print("------------------------------\n")
from scores.continuous import mse

# note: see how we can specify reduce_dims on an array of dimensions instead of a single string
display(nse(forecast_da, observed_da, reduce_dims=["time", "station"]).values)
print("Note that this still works, but we only get a single output - for lead time 1...")

print("\n-----------------------------")
print(" NSE broadcasted using numpy")
print("-----------------------------\n")
display(nse(forecast_da, observed_da_broadcast, reduce_dims=["time", "station"]).values)
print("\n\n...However using the numpy based broadcasting we get all 7 leadtimes")
print("Note we could have also used `preserve_dims='lead_time'` to get the same result:\n\n")
display(nse(forecast_da, observed_da_broadcast, preserve_dims="lead_time").values)


broadcasted obs shape:

Frozen({'time': 31, 'station': 5, 'lead_time': 7})


Check index 0 against index 1 have the same repeated values

array(True)


Check index 0 against last index (-1) have the same repeated values

array(True)


Now we have repeating obs... let's compute NSE


------------------------------
 NSE broadcasted using xarray
------------------------------

array([0.57235442])

Note that this still works, but we only get a single output - for lead time 1...

-----------------------------
 NSE broadcasted using numpy
-----------------------------

array([0.57235442, 0.5626212 , 0.51905304, 0.45527247, 0.60358371,
       0.53880208, 0.50453494])



...However using the numpy based broadcasting we get all 7 leadtimes
Note we could have also used `preserve_dims='lead_time'` to get the same result:

array([0.57235442, 0.5626212 , 0.51905304, 0.45527247, 0.60358371,
       0.53880208, 0.50453494])

Is NSE causing the quirky broadcasting or is it MSE? (spoiler: neither)

As mentioned nse uses mse under the hood for consistency. We can show that this broadcasting behaviour is not unique to nse by replacing those operations with mse.

In any case - the answer is neither. In fact it has nothing to do with scores - it’s a fundamental between how numpy and xarray decide to do broadcasting. Noting that scalars are exceptions to the rule.

[15]:

from scores.continuous import mse

print("\nNSE - leadtime just has one value")
display(nse(forecast_da, observed_da, reduce_dims=["time", "station"]).values)

print("\nMSE - same deal")
display(mse(forecast_da, observed_da, reduce_dims=["time", "station"]).values)

print("\ntaking the mean difference between two datasets - same deal")
display((forecast_da - observed_da).mean(["time", "station"]).values)

print("\nSubtracting a scalar is fine though")
display((forecast_da - 5).mean(["time", "station"]).values)


NSE - leadtime just has one value

array([0.57235442])


MSE - same deal

array([707.48065628])


taking the mean difference between two datasets - same deal

array([-2.36811031])


Subtracting a scalar is fine though

array([145.02236298, 144.6017132 , 145.24734202, 143.92826001,
       143.55788964, 144.84651296, 142.14080276])

Example 8: 4D Array: Time, Station, lead times and ensemble#

The following example shows NSE computed on ensemble forecasts that are synthetically generated. We would still hit the broadcasting issue above, since the observed data will not match the dimensions of the ensembles and to compute the scores we want to use numpy’s style of replicating data. This example deals with this by looping through the ensembles, iteratively computing NSE for each ensemble to avoid memory bloat. Broadcasted operations will then apply to any remaining dimensions that match in size between obs and fcst.

[1]:

def create_synthetic_ensemble_data():
    # Define dimensions
    time = pd.date_range("2024-01-01", "2024-01-31", freq="D")
    stations = ["Station1", "Station2", "Station3", "Station4", "Station5"]
    lead_times_forecast = np.arange(1, 8)  # Lead times from 1 to 7 for forecast
    lead_times_observed = np.array([1])  # Lead time of 1 day for observed data
    ensemble = [f"Ensemble{i}" for i in range(1, 21)]  # Ensemble members from Ensemble1 to Ensemble20

    # Generate synthetic hydrograph data
    np.random.seed(0)  # For reproducibility

    # Create base flow as a constant value of 30 cumec
    base_flow = 30

    # Adding random fluctuations around the base flow for forecast and observed data
    forecast_data = base_flow + np.random.randint(
        0, 301, size=(len(time), len(stations), len(lead_times_forecast), len(ensemble))
    )
    observed_data = base_flow + np.random.randint(0, 301, size=(len(time), len(stations), len(lead_times_observed)))

    # Clip the data to ensure it stays within a certain range
    forecast_data = np.clip(forecast_data, 0, 300)
    observed_data = np.clip(observed_data, 0, 300)

    # Create forecast DataArray
    forecast_da = xr.DataArray(
        forecast_data,
        coords={"time": time, "station": stations, "lead_time": lead_times_forecast, "ensemble": ensemble},
        dims=["time", "station", "lead_time", "ensemble"],
        name="forecast",
    )

    # Create observed DataArray
    observed_da = xr.DataArray(
        observed_data,
        coords={"time": time, "station": stations, "lead_time": lead_times_observed},
        dims=["time", "station", "lead_time"],
        name="observed",
    )

    return forecast_da, observed_da

[17]:

# Create forecast and observed DataArrays
forecast_ensemble_da, observed_da = create_synthetic_ensemble_data()

# Display the DataArrays
display(forecast_ensemble_da.sizes)
display(observed_da.sizes)

Frozen({'time': 31, 'station': 5, 'lead_time': 7, 'ensemble': 20})

Frozen({'time': 31, 'station': 5, 'lead_time': 1})

[18]:

# Initialize an empty list to store results
results_list = []

# Iterate through each ensemble member and lead time
for imember in forecast_ensemble_da["ensemble"].values:
    for ilead in forecast_ensemble_da["lead_time"].values:
        # Select the forecast data for the current ensemble member and lead time
        cur_fcst_da = forecast_ensemble_da.sel(ensemble=imember, lead_time=ilead)

        # Calculate NSE value for the selected forecast data and observed data
        nse_value = nse(cur_fcst_da, observed_da).values

        # Append the results as a dictionary to the list
        results_list.append({"ensemble": imember, "lead_time": ilead, "NSE": nse_value})

# Create a DataFrame from the list of dictionaries
nse_results = pd.DataFrame(results_list)

[19]:

# Print the DataFrame
print(nse_results.head(3))

    ensemble  lead_time                  NSE
0  Ensemble1          1  -1.1721371704833192
1  Ensemble1          2  -1.0448237401444582
2  Ensemble1          3   -1.060897482889457

Exercise for the Reader#

The above can also be done using xarrays’s map_blocks or apply_ufunc methods - which may reduce the load on memory further by using dask to partition and compute chunks more efficiently.

Note: that dask should not be the first solution and a simple loop like above may usually accomplish things sufficiently (and explicitly) - even for very large datasets - noting that xarray/netcdf4 loads data lazily with or without dask. dask on the other hand is useful when dealing with distributed compute and storage.

Fore more info see:

dask guidelines - in particular section on start small: https://docs.dask.org/en/stable/best-practices.html#start-small

xarray ufunc/map_blocks - in particular any tutorials referenced: https://docs.xarray.dev/en/stable/user-guide/dask.html#dask-automatic-parallelization

numpy broadcasting - these are helpful integrations to the underlying ufunc allowing observations to be broadcast to forecast ensembles while still being compatible with the above xarray functions: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_arrays.html#numpy.broadcast_arrays

Other Things to Try#

There are several variants of NSE that are useful for particular applications, see: https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient

Scores currently supports the use of weights. This allows the user to calculate wNSE, which is typically scaled by the obs itself. See: Hundecha and Bárdossy (2004).
weights can also be used as “selectors” by specifying binary values (in this case 1 => include datum or np.nan => exclude datum. This is useful for slicing hydrographs by their observation values and accumulating over the resultant extents.
NNSE a variant of NSE that works with machine learning optimisation functions.
LNSE logarithmic processing of observation and forecasts prior to computing NSE for better representation of smaller values.

Most of the above variations can be computed either by:

preprocessing the observations/forecast,
composing the NSE computation with another function,
or scaling using the weights input argument.

Nash-Sutcliffe Efficiency (NSE)

Contents

Nash-Sutcliffe Efficiency (NSE)#

Definition#

Interpretation#

References#

Using the `nse` Function#

Example 1: Xarray DataArray#

Example 2: Large Xarray DataArrays#

Example 3: Angular and Array#

Example 4: Weight and Array#

Example 5: 2D Array: Time and Station#

Example 6: 3D Array: Ensemble, Station and Time#

Example 7: 3D Array: Time, Station and Lead-time#

NSE Calculation along Lead-time#

Proper Broadcasting#

Example 8: 4D Array: Time, Station, lead times and ensemble#

Exercise for the Reader#

Other Things to Try#

Nash-Sutcliffe Efficiency (NSE)

Contents

Nash-Sutcliffe Efficiency (NSE)#

Definition#

Interpretation#

References#

Using the nse Function#

Example 1: Xarray DataArray#

Example 2: Large Xarray DataArrays#

Example 3: Angular and Array#

Example 4: Weight and Array#

Example 5: 2D Array: Time and Station#

Example 6: 3D Array: Ensemble, Station and Time#

Example 7: 3D Array: Time, Station and Lead-time#

NSE Calculation along Lead-time#

Proper Broadcasting#

Example 8: 4D Array: Time, Station, lead times and ensemble#

Exercise for the Reader#

Other Things to Try#

Using the `nse` Function#