Interactive online version: Binder badge. Download notebook.
[1]:
import operator
import scores.categorical
import xarray as xr

Binary Categorical Scores and Binary Contingency Tables (Confusion Matrices)#

Examples of binary categorical scores include: hit rate (true positive rate), false alarm rate and accuracy.

These scores are calculated based on a binary contingency table, which comprises true positives (hits), true negatives (correct negatives), false positives (false alarms) and false negatives (misses).

If you would like to know more, visit the Methods for dichotomous (yes/no) forecasts information produced by the WWRP/WGNE Joint Working Group on Forecast Verification Research.

Producing these categorical statistics is inherently a three-step process.

  1. For data that is either continuous or categorical, first convert it to binary data using an “event definition”. For data that is already binary, go straight to step two.

  2. Produce a contingency manager object, which:

    • stores the true positives, true negatives, false positives and false negatives, and

    • allows the data to be aggregated and transformed (for example, reducing the latitude and longitude dimensions while preserving the height dimension in order to assess the accuracy at different vertical levels).

  3. Calculate the desired metrics from the contingency tables produced in step two. scores contains the following binary categorical scores:

    • accuracy (fraction correct)

    • base rate

    • forecast rate

    • frequency bias (bias score)

    • hit rate (true positive rate, probability of detection, sensitivity, recall)

    • false alarm ratio

    • false alarm rate (probability of false detection) - not to be confused with false alarm ratio

    • success ratio (precision, positive predictive value)

    • negative predictive value

    • threat score (critical success index)

    • the Peirce skill score (true skill statistic, Hanssen and Kuipers discriminant)

    • specificity (true negative rate)

    • F1 score

    • equitable threat score (Gilbert’s skill score)

    • Heidke skill score (Cohen’s kappa)

    • odds ratio

    • odds ratio skill score (Yule’s Q)

    • Symmetric Extremal Dependence Index (SEDI)

scores provides support for all three steps, plus convenient functions for working efficiently. Most of the scores APIs are functional in nature, however introducing some classes (see below) makes binary categorical metrics much easier to calculate and work with. This notebook starts with simple examples and builds up to more complex ones.

Step One: Converting continuous (or categorical) data into binary data#

In some instances, data will already be binary. Examples of binary data include forecasts of “events” - such as “it will rain” or “a tropical cyclone will occur”.

If the data is already binary, ignore this step and go straight to step two.

However, in many situations, data is not binary and is instead either continuous or categorical. An example of continuous data is a physical quantity, such as a temperature in degrees Celcius. An example of categorical data is the Beaufort Wind Scale.

In order to use binary contingency scores, any non-binary data must first be converted into binary data.

Options include either:

  • convert the non-binary data into binary data outside of scores, then input the binary data into scores (in which case, go to step two), OR

  • use scores to efficiently convert the non-binary data into binary data, using the scores “event operator” class.

scores provides a general class called an “EventOperator”. scores provides in-built event operator types (e.g. greater-than, less-than and many others). Alternatively, users can define their own event operator implementation for use within scores (for example, for more complex event definitions involving multiple variables).

The rest of this section focuses on using scores to convert continuous data into binary data using the “event operator” class.

For instance, let’s say you need to verify how accurately a numerical weather prediction (NWP) system forecasts “heavy rainfall”. As the NWP preciptation rate predictions are continuous, you will need to first convert the precitation rate predictions into “heavy rainfall events” as binary data. To do this, you:

  • first define an “event operator” - in this case by defining what constitutes a “heavy rainfall event”, for example precitation above a specified threshold, and

  • then apply this “heavy rainfall event” operator to the continuous precipiation rate prediction data. Doing so will convert the predictions into binary data - into either “heavy rainfall event = true” or “heavy rainfall event = false”.

A more complex event like “a thunderstorm” may involve more complex logic to determine when the event occurs, incorporating factors such as wind and lightning data as well as the rainfall rate.

Worked Example#

Here is an example of using scores to convert non-binary data into binary data using the “event operator” class.

Consider the following two idealized tables of forecast and observed values (in mm) for precipitation.

[2]:
# Provides a basic forecast data structure in three dimensions
simple_forecast = xr.DataArray(
    [
            [
                    [35.9, 44.0, 49],
                    [50.7, 51.4, 52.8],
                    [38.4,  38.5, 22.3],
            ],
                    [
                    [25.6, 38.1, 38.2],
                    [52.4, 55.9,  51.0],
                    [29.1,  33.1, 24.5],
            ],
    ],
    coords=[[10, 11], [0, 1, 2], [5, 6, 7]], dims=["lead_hour", "lat", "lon"])
[3]:
# Idealized observations are within 1.0 or 4.0 of the forecast in all cases except one
# This can be used to find some exact matches, and some close matches
simple_obs = xr.DataArray(
    [
            [
                    [35.8, 44.2, 52],
                    [53.7, 52.3, 53.7],
                    [38.3,  36.4, 21.4],
            ],
                    [
                    [25.7, 38.2, 38.7],
                    [51.7, 56.2,  49.9],
                    [29.6,  30.2, 16.9],
            ],
    ],
    coords=[[10, 11], [0, 1, 2], [5, 6, 7]], dims=["lead_hour", "lat", "lon"])

For the purposes of this example, let’s say an event occurs if the precipitation value reaches 50. We therefore define a threshold based event operator with a threshold of 50.

Applying this operator means scores converts the non-binary data into binary data. It does this by creating binary event data - precipitation data with a value less than 50 is “false” and precipitation data with a value greater than or equal to 50 is “true”.

The following code creates a “greater than or equal to” operator - using scores built-in ThresholdEventOperator - which creates events with a threshold of “>= 50”. It can be repeatedly called with different event thresholds, for reasons that will be explained later.

[4]:
# An event here is defined as precipitation with a value greater than 50

# The EventThresholdOperator can take a variety of operators from the python "operator" module, or a user-defined function
# The default is operator.ge, which is the same as ">=" but in functional form.
# Here we pass it in explicitly so it's very clear what will happen

event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50.0, default_op_fn=operator.ge)
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50.0) # This is the same thing

# https://docs.python.org/3/library/operator.html provides an overview of what standard operator types can be supplied
# You may wish to exeriment with the following alternatives now

# Greater-than (not equal to) operator
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50, default_op_fn=operator.gt)

# Less-than operator
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50, default_op_fn=operator.lt)
[5]:
# The event operator creates binary "event tables".
# If you don't wish to visualise the binary "event tables", you can ignore this section.
# However, if you wish to visualise or utilise the the "event tables" directly, you can do so as follows:

forecast_binary, observed_binary = event_operator.make_event_tables(simple_forecast, simple_obs)

# It is also possible to vary both the event threshold and the event operator from the defaults during this step
# This commented-out code uses the less-than operator with a threshold of 45
# forecast_binary, observed_binary = event_operator.make_event_tables(simple_forecast, simple_obs, event_threshold=45, op_fn=operator.lt)

print(forecast_binary)
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B
array([[[0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.]]])
Coordinates:
  * lead_hour  (lead_hour) int64 16B 10 11
  * lat        (lat) int64 24B 0 1 2
  * lon        (lon) int64 24B 5 6 7
[6]:
print(observed_binary)
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B
array([[[0., 0., 1.],
        [1., 1., 1.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 1., 0.],
        [0., 0., 0.]]])
Coordinates:
  * lead_hour  (lead_hour) int64 16B 10 11
  * lat        (lat) int64 24B 0 1 2
  * lon        (lon) int64 24B 5 6 7

Step Two: Making Binary Contingency Information#

Once the event tables have been made, the next step is to make a contingency manager object. It is possible (and quicker) to make a contingency manager from the event operator directly, but we will first show the process as separate steps, and then show how to do the two things at once. The contingency manager stores the binary forecast event data, the binary observed event data and the contingency table itself, plus provides functions for aggregating the data along various dimensions, and calculating metrics from that information.

[7]:
contingency_manager = scores.categorical.BinaryContingencyManager(forecast_binary, observed_binary)
contingency_manager.format_table()   # Print a view of the contingency table and HTML (e.g. in notebooks). Only works for 2x2 tables.
[7]:
Positive Observed Negative Observed Total
Positive Forecast 5 1 6
Negative Forecast 1 11 12
Total 6 12 18
[8]:
# Uncomment this line to view the full functionality provided by a contingency manager
# help(contingency_manager)
[9]:
# It is also possible, and briefer, to make the binary contingency table directly from the event operator, as follows:
contingency_manager = event_operator.make_contingency_manager(simple_forecast, simple_obs)

# It is also possible to vary both the event threshold and the event operator from the defaults during this step
# contingency_manager = event_operator.make_contingency_manager(simple_forecast, simple_obs, event_threshold=2.0, op_fn=operator.lt)

contingency_manager.get_table()  # A simplified xarray object containing the contingency table can be obtained
[9]:
<xarray.DataArray (contingency: 5)> Size: 40B
array([ 5., 11.,  1.,  1., 18.])
Coordinates:
  * contingency  (contingency) <U11 220B 'tp_count' 'tn_count' ... 'total_count'
[10]:
# Further, if it turn out you want them, the event tables are still there

contingency_manager.fcst_events
[10]:
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B
array([[[0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.]]])
Coordinates:
  * lead_hour  (lead_hour) int64 16B 10 11
  * lat        (lat) int64 24B 0 1 2
  * lon        (lon) int64 24B 5 6 7

Step Three: Calculating Binary Contingency Scores#

The binary contingency manager can then be utilised to calculate a wide variety of scores which are based on the contingency table.

Below are two examples of using scores to calculate metrics directly from the contigency manager:

  • “accuracy”: (true positives + true negatives)/(total number of samples)

  • “false_alarm_rate”: (number of false positives)/(true negatives + false negatives)

[11]:
contingency_manager.accuracy()
[11]:
<xarray.DataArray ()> Size: 8B
array(0.88888889)
[12]:
contingency_manager.false_alarm_rate()
[12]:
<xarray.DataArray ()> Size: 8B
array(0.08333333)

Tranforming and Computing Large Tables#

Using the “transform” operation on the contingency table can be helpful for handling very large data sets.

Transforming the table triggers the ‘compute’ function against the dask arrays which until this point may have been loaded lazily. This is insignificant for small data sets, but if calculating a contingency table for a multi-terabyte gridded data set, can be time-consuming. The tranformed table contains only the resultant contingency table, which is thereafter very fast to work with.

Here we present a ‘transform’ with no arguments to trigger the computation of the aggregated contingency table. As such, it may be an expensive operation, but the resultant table is very small and all the relevant scores can be easily calculated as the table is re-used.

[13]:
computed = contingency_manager.transform()
computed.accuracy()
[13]:
<xarray.DataArray ()> Size: 8B
array(0.88888889)

Exploring Dimensionality#

The transform function can also be utilised to explore the dimensionality of the data. This function accepts preserve_dims and reduce_dims as per the rest of the scores codebase.

While the team behind scores intend to add functionality so that scores can support weights during transformation, this functionality has not yet been added.

At present, the following approach can be used to examine contingency scores along the data dimensions.

[14]:
# If it is wanted, the underlying event counts can be accessed
new_manager = contingency_manager.transform(preserve_dims='lead_hour')
print(new_manager)
Contingency Manager (xarray view of table):
<xarray.DataArray (contingency: 5, lead_hour: 2)> Size: 80B
array([[3., 2.],
       [5., 6.],
       [0., 1.],
       [1., 0.],
       [9., 9.]])
Coordinates:
  * lead_hour    (lead_hour) int64 16B 10 11
  * contingency  (contingency) <U11 220B 'tp_count' 'tn_count' ... 'total_count'
[15]:
# Calculating the accuracy on this transformed manager will show the accuracy along the height dimension

new_manager.accuracy()
[15]:
<xarray.DataArray (lead_hour: 2)> Size: 16B
array([0.88888889, 0.88888889])
Coordinates:
  * lead_hour  (lead_hour) int64 16B 10 11

Next Steps#

An interesting next step could include:

  • Calculating multiple contingency tables for different thresholds to observe how accuracy changes according to different event thresholds