Release Notes (What’s New)#

Version 2.6.0 (July 17, 2026)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

scores has introduced support for pandas version 3. See PR #1062.
Added three new metrics:
- Relative economic value: scores.continuous.relative_economic_value, scores.probability.relative_economic_value, scores.categorical.relative_economic_value and scores.plotdata.relative_economic_value. See PR #999 and PR #1088.
- Relative economic value from rates: scores.continuous.relative_economic_value_from_rates, scores.probability.relative_economic_value_from_rates , scores.categorical.relative_economic_value_from_rates and scores.plotdata.relative_economic_value_from_rates. See PR #999.
- Receiver (relative) operating characteristic area under curve (ROC AUC): scores.probability.roc_auc. This is significantly computationally more efficient than calculating the area under the curve using scores.probability.roc_curve_data. See PR #1036.
Added a kwarg to scores.continuous.kge to allow switching between the original Kling-Gupta Efficiency (KGE) formulation of Gupta et al. (2009) and the modified formulation of Kling et al. (2012). The default remains the original implementation. See PR #1069.

Deprecations#

This deprecation was first introduced in Version 2.5.0. Support for include_components will be removed from threshold-weighted continuous ranked probability score (twCRPS) functions in a future version of scores. The scores development team believe using include_components=True may lead to misleading results when used with twCRPS functions. As such, the following are now deprecated:
- support for include_components in scores.probability.tw_crps_for_ensemble,
- support for include_components in scores.probability.tail_tw_crps_for_ensemble and
- support for include_components in scores.probability.interval_tw_crps_for_ensemble.
  See PR #991.

Bug Fixes#

Improved NaN handling in scores.probability.roc_curve_data. See PR #1036.
Improved dimension reduction code to ensure consistent dimension ordering in returned objects in scores.probability.roc_curve_data. See PR #1036.

Documentation#

Added “Relative Economic Value (REV)” tutorial. See PR #999.
Updated the “Receiver Operating Characteristic (ROC)” tutorial to include information about the newly-added scores.probability.roc_auc function (which is significantly more computationally efficient). See PR #1036.
Updated the “Kling–Gupta Efficiency (KGE)” tutorial to include information about the newly-added kwarg which allows users to switch between the original KGE formulation of Gupta et al. (2009) and the modified formulation of Kling et al. (2012). See PR #1069.
Updated the “Contributing Guide” to include guidelines for generative tool usage. See PR #1027.
Added a commit template to the scores repository and added instructions in the “Contribuing Guide” for setting up and using the commit template. See PR #1045 and PR #1048.
Added an entry for “Aggregate” to the “Processing” table in docs/included.md. See PR #1038.
Updated docstrings (e.g. added examples and improved grammar) for multiple functions in the API documentation. See PR #996 and PR #1056.
Corrected erroneous namespaces and added :py:func: before the correct namespaces in src/scores/probability/crps_impl.py. Specifically, changed scores.probability.functions.fill_cdf to :py:func:`scores.processing.cdf.fill_cdf` and changed scores.probability.functions.cdf_envelope to :py:func:`scores.processing.cdf.cdf_envelope` . See PR #1050.
Updated links to the new verification site https://jwgfvr.github.io/forecastverification (which will replace the prior site: https://www.cawcr.gov.au/projects/verification) in tutorials/Additive_and_multiplicative_bias.ipynb, tutorials/Binary_Contingency_Scores.ipynb and src/scores/categorical/contingency_impl.py. See PR #1029, PR #1030 and PR #1031.
Updated link from https://www.openradar.io/ (which is no longer active) to https://nci.org.au/aura/ in docs/data.md. See PR #1081.
Replaced link to ERA5 dataset in docs/data.md from https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5 to https://doi.org/10.24381/cds.adbb2d47. See PR #1081.
Updated the “Quantile Interval Score and Interval Score” tutorial to use the updated lower-case “h” date range syntax, due to changes in pandas. See PR #1065.
Fixed integration of tutorials with Binder. See PR #1083.

Internal Changes#

Moved tutorial files from a top-level directory, tutorials, into the docs/tutorials/ subdirectory. This change was made to support how recent Sphinx versions handle symbolic links to files located outside of the docs subdirectory. Users who run the tutorials will need to use the updated location. There is no change to how the tutorials render in the documentation. Unpinned the version of Sphinx used by scores, so that the most recent versions of Sphinx can be used. See PR #1042.
Introduced the use of the Python doctest module for automated testing of examples in API documentation (docstrings). Revised existing docstrings as appropriate to meet doctest tool requirements. Added the doctest tool to CI/CD and developer tooling via pre-commit. See PR #1056.
Update CI pipeline to run tests against all versions of Python even if one fails. See PR #1063.

Contributors to this Release#

Oisín M. Morrison* (@Oisin-M), Daniel Karney* (@danielkarney), Thomas C. Pagano (@thomaspagano), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong), Nicholas Loveday (@nicholasloveday), John Sharples (@John-Sharples), Mohammadreza Khanarmuei (@reza-armuei), Nikeeth Ramanathan (@nikeethr), Maree Carroll (@mareecarroll) and Durga Shrestha (@durgals).

* indicates that this release contains their first contribution to scores.

Version 2.5.0 (February 14, 2026)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

scores has introduced support for Python 3.14. See PR #989.
Added probability integral transform (PIT) classes:
- PIT for ensembles or cumulative distribution functions (CDFs): scores.probability.Pit
- PIT for predictive CDFs evaluated at observations: scores.probability.PitFcstAtObs.
  See PR #919.
Added a new function for generating data for rank histograms:
- Rank histogram: scores.plotdata.rank_histogram (also available as scores.probability.rank_histogram). See PR #919 and PR #1012.

Deprecations#

Support for include_components will be removed from threshold-weighted continuous ranked probability score (twCRPS) functions in a future version of scores. The scores development team believe using include_components=True may lead to misleading results when used with twCRPS functions. As such, the following are now deprecated:
- support for include_components in scores.probability.tw_crps_for_ensemble,
- support for include_components in scores.probability.tail_tw_crps_for_ensemble and
- support for include_components in scores.probability.interval_tw_crps_for_ensemble.
  See PR #991.

Bug Fixes#

Fixed an IndexError in receiver (relative) operating characteristic (ROC). As such, mutlidimensional input arrays are now supported when using automatic thresholds. See PR #963.
Fixed a situation where receiver (relative) operating characteristic (ROC) calculations could trigger a NotImplementedError within Dask. scores.probability.roc_curve_data and scores.plotdata.roc have been updated so that if fcst or obs is an xarray object backed by a dask array, and check_args is True, the min and max of the arrays will be calcuated immediately, which triggers computation. This can be avoided by setting check_args=False. See PR #987.

Documentation#

Added two new tutorials:
- “The Probability Integral Transform (PIT)”. See PR #919.
- “Rank Histogram”. See PR #919.
Updated documentation to say there are now over 75 metrics, statistical techniques and data processing tools contained in scores. See PR #1014.
Updated “Acknowledging or Citing scores” to include citation details for both our Journal of Open Source Software paper and the Zenodo record for the version of scores being used. See PR #1003.
Updated the “Contributing Guide” to include additional information about running pre-commit checks. See PR #977 and PR #1010.
Corrected a function name in an example in the scores.stats.statistical_tests.diebold_mariano docstring. See PR #978.
Pinned version of Sphinx to prior to version 8 (i.e. sphinx<8), due to a change in symlink handling. This will need to be resolved before scores can migrate to more recent versions of Sphinx. See commit f60ae0c.

Internal Changes#

Fixed Numba warnings in fast Continuous Ranked Probability Score (CRPS) implementation when NaNs are present in input data. See PR #957.
Set join explicity to “outer” to be compatible with upcoming changes in Xarray. See PR #964.
Replaced implementations of SciPy's legacy function interpolate.interp1d with a wrapper function. See PR #971.
Removed the use of NetCDF data on disk from tests. Data is now created on the fly. See PR #966.
Added Ruff. Ruff is now included in pre-commit and replaces Pylint, Black, Bandit and isort. See PR #967, PR #972, PR #979 and PR #990.
Replaced mypy with ty. Added ty to pre-commit for type checking. PR #984 and PR #993.
Updated CI/CD and pre-commit hooks to treat warnings as test failures. See commit a28042d.
The directory name tests/probabilty/ was spelled incorrectly and has been renamed to tests/probability/. See PR #1021.

Contributors to this Release#

Felix Esperson* (@fesperson), Jurian Beunk* (@jurianbeunk), Xiaoxi Wu* (@wuxx66), Robert J. Taggart (@rob-taggart), John Sharples (@John-Sharples), Belinda Trotta (@btrotta-bom), Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Stephanie Chong (@Steph-Chong), Durga Shrestha (@durgals), Mohammadreza Khanarmuei (@reza-armuei) and Nikeeth Ramanathan (@nikeethr).

* indicates that this release contains their first contribution to scores.

Version 2.4.0 (January 14, 2026)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

Added an optional installation variant “fast” which introduces Numba as an optional dependency to support optimised implementations for some metrics. scores.probability.crps_cdf will now automatically switch to an optimised implementation if Numba is installed in the environment. The “fast” variant can be installed with pip install scores[fast] if wanted. See PR #931.

Bug Fixes#

Fixed a bug in threshold-weighed scoring methods that caused the code to fail if the first object in the tuple for interval_where_one was an xr.DataArray and the second was a float, e.g. np.inf. This method has now been corrected to allow a float, int, or xr.DataArray for the interval arguments. See PR #948.

Documentation#

Updated links to the new verification site https://jwgfvr.github.io/forecastverification (which will replace the prior site: https://www.cawcr.gov.au/projects/verification) in docs/included.md, tests/categorical/test_contingency.py and src/scores/continuous/standard_impl.py. See PR #933, PR #934 and PR #935.
Updated the documentation and citation links for the scoringrules entry in “Related Works”. See PR #937.
Fixed rendering (removed an unintentional block quote), and thereby also resolved a sphinx build error, in the scores.continuous.nse docstring. See PR #936.

Internal Changes#

Sped up (improved the computational efficiency of) the continuous ranked probability score (CRPS) for ensembles, by sorting the ensemble members to compute the CRPS spread term. See PR #928.

Contributors to this Release#

Belinda Trotta* (@btrotta-bom), Taylor Mandelbaum* (@aaTman), Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Stephanie Chong (@Steph-Chong), Robert J. Taggart (@rob-taggart) and Nikeeth Ramanathan (@nikeethr).

* indicates that this release contains their first contribution to scores.

We also acknowledge the developers of xskillscore and properscoring as we have adapted code from their repositories under a suitable compatible license. This acknowledgment has also been added to NOTICE.md as is best practice. The xarray wrapper function scores.probability.crps_numba.crps_cdf_exact_fast is based on the code for crps_ensemble from xskillscore (https://github.com/xarray-contrib/xskillscore/blob/main/xskillscore/core/probabilistic.py), released under the Apache-2.0 License with copyright attributed to xskillscore developers (as at 11 Dec 2025). The vectorisation of crps_at_point follows the example of _crps_ensemble_gufunc from properscoring (https://github.com/properscoring/properscoring/blob/master/properscoring/_gufuncs.py), released under the Apache-2.0 License with copyright attributed to The Climate Corporation (2015).

Version 2.3.0 (October 14, 2025)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

Added a new metric:
- Percent within X: scores.continuous.percent_within_x. See PR #865.
Added one new metric and two supporting functions. Following the publication of Taggart & Wilke (2025), these have been moved from scores.emerging to scores.categorical:
- Risk matrix score: scores.categorical.risk_matrix_score.
- Risk matrix score - matrix weights to array: scores.categorical.matrix_weights_to_array.
- Risk matrix score - warning scaling to weight array: scores.categorical.weights_from_warning_scaling.
  Note: while removing the functions from scores.emerging is technically a breaking change, breaking changes that only impact the “emerging” section of the API do not trigger major releases. This is because the “emerging” section of the API is designed to hold metrics while they are undergoing peer review and it is expected they will be moved out of “emerging” once peer review has concluded.
  See PR #904.
Updated the weighting method used by all scores functions that allow the user to supply weights. The updated weighting method normalises the user-supplied weights rather than applying them directly. While both approaches can be valid, the revised approach is more in keeping with general expectations and is conistent with the default approach taken by other libraries. As a part of this change, users can no longer supply weights that contain NaNs (zeroes may be used instead where appropriate). The “Introduction to weighting and masking” tutorial has been updated and substantially expanded to explain what the weighting does mathematically. See PR #899.
Added optional automatic generation of thresholds for the receiver (relative) operating characteristic (ROC) curve (scores.probability.roc_curve_data). See PR #882.

Bug Fixes#

Updated scores.continuous.quantile_interval_score and scores.continuous.quantile_score so they now recognise preserve_dims='all'. Beforehand, these functions were not recognising the special case of preserve_dims='all' and were raising an error unless a list of dimensions was supplied. (Note: the score calculations were not incorrect, it was only that preserve_dims='all' was not recognised.) See PR #893.

Documentation#

Added “Percent Within X” tutorial. See PR #865.
Substantially updated and expanded the “Introduction to weighting and masking” tutorial, following changes to the weighting method used by all scores functions that allow the user to supply weights. The updated and expanded tutorial explains what the weighting does mathematically. See PR #899.
Updated the “Quantile-Quantile (Q-Q) Plots for Comparing Forecasts and Observations” tutorial so that the plots render in Read the Docs. See PR #883.
Updated the description of the second figure in the “Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles” tutorial. See PR #897.
Updated multiple sections of the documentation following the risk matrix score moving from scores.emerging to scores.categorical, including:
- updating docstrings and docs/included.md,
- updating the tutorial with the new categorical methods, and
- updating references in several sections of the documentation, following the publication of Taggart & Wilke (2025).
  See PR #904.
Updated several tutorials to subtract the LEAD_TIME Timedelta from the base times in the forecast data to make the forecast and observation data line up correctly. See PR #920.
In the README, “Detailed Installation Guide” and “Contributing Guide”, updated pip install commands to use quotation marks where square brackets are used to specify optional dependencies. This is to ensure compatibility with zsh (the default on macOS) while still working as expected on bash. See PR #917.
Added thumbnail images to multiple entries in the tutorial gallery. See PR #874, PR #875, PR #877, PR #879, PR #880, PR #881 and PR #884.

Internal Changes#

In multiple tutorials, added the keyword argument decode_timedelta=True to xarray.open_dataset for the downloaded files forecast_grid.nc and analysis_grid.nc. See PR #894.
Perform input checking earlier in various function calls to improve efficiency, so that error messages can be raised before incurring computational expenses. See PR #905.

Contributors to this Release#

Thomas C. Pagano* (@thomaspagano), Paul R. Smith* (@prs247au), J. Smallwood* (@jdgsmallwood), Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Nikeeth Ramanathan (@nikeethr), Stephanie Chong (@Steph-Chong), Robert J. Taggart (@rob-taggart) and Mohammadreza Khanarmuei (@reza-armuei).

* indicates that this release contains their first contribution to scores.

Version 2.2.0 (July 26, 2025)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

Added a new metric:
- Spearman’s correlation coefficient: scores.continuous.correlation.spearmanr. See PR #773.
Added a new function for generating data for diagrams:
- Quantile-Quantile (QQ) plots: scores.plotdata.qq. See PR #852.
Added new features to the FIxed Risk Multicategorical (FIRM) score (scores.categorical.firm):
- Added support for xr.Datasets in addition to the existing support for xr.DataArrays. See PR #853.
- Added the optional argument include_components. If include_components is set to True the function will return the overforecast and underforecast penalties along with the FIRM score. See See PR #853 and PR #864.
Added a new scores.plotdata section to the API for functions that generate data for verification plots. See PR #852.

Bug Fixes#

Fixed an issue where scores.plotdata.roc didn’t add the point (0, 0) in some instances. See PR #863.
Fixed an issue in scores.continuous.quantile_interval_score where broadcasting wasn’t being done correctly in some cases. See PR #867.

Documentation#

Added two new tutorials:
- “Spearman’s Correlation Coefficient”. See PR #773.
- “Quantile-Quantile (Q-Q) Plots for Comparing Forecasts and Observations”. See PR #852.
Substantially updated “The FIxed Risk Multicategorical (FIRM) Score” tutorial. See PR #853.
Fixed an error in the formula in the docstring for the quantile interval score (scores.continuous.quantile_interval_score). (Note: this error was only present in the docstring - the code implemenation of the function was correct and the tutorial listed the correct formula.) See PR #851.
Updated several “full changelog” URLs in the release notes. See PR #859.

Internal Changes#

Improved the efficiency of the FIxed Risk Multicategorical (FIRM) score (scores.categorical.firm) by moving the call to gather dimensions to earlier within the method. See PR #853.
Added a new scores.plotdata section to the API for functions that generate data for verification plots. See PR #852. The following internal changes were made:
- Receiver (Relative) Operating Characteristic (ROC):
  - scores.probability.roc_curve_data was moved to scores.plotdata.roc, but can still be imported as scores.probability.roc_curve_data.
- Murphy Score:
  - scores.continuous.murphy_score was moved to scores.plotdata.murphy_score, but can still be imported as scores.continuous.murphy_score and scores.probability.murphy_score.
  - scores.continuous.murphy_thetas was moved to scores.plotdata.murphy_thetas, but can still be imported as scores.continuous.murphy_thetas and scores.probability.murphy_thetas.
Added an additional CI/CD pipeline for testing without Dask. See PR #856.

Contributors to this Release#

Liam Bluett (@lbluett), Nicholas Loveday (@nicholasloveday), Nikeeth Ramanathan (@nikeethr), Tennessee Leeuwenburg (@tennlee), Robert J. Taggart (@rob-taggart), Stephanie Chong (@Steph-Chong) and Mohammadreza Khanarmuei (@reza-armuei).

Version 2.1.0 (April 30, 2025)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

Added a new fuction:
- Block bootstrap: scores.processing.block_bootstrap. See PR #418.
Added two new metrics:
- Stable equitable error in probability space (SEEPS): scores.categorical.seeps. See PR #809 and PR #833.
- Nash-Sutcliffe model efficiency coefficient (NSE): scores.continuous.nse. See PR #815.

Documentation#

Added “Block Bootstrapping” tutorial. See PR #418.
Added “Stable Equitable Error in Probability Space (SEEPS)” tutorial. See PR #809.
Added “Nash-Sutcliffe Efficiency (NSE)” tutorial. See PR #815.
Updated the “Continuous Ranked Probability Score (CRPS) for Ensembles” tutorial:
- Labelled dimensions in fcst/obs data.
- Updated description of the plot to say the area squared corresponds to the CRPS.
- Added an example with multiple coordinates along a dimension. See PR #805.
Updated “Data Sources”:
- Added links to two additional datasets for gridded global numerical weather prediction.
- Added links to several additional datasets for point-based data. See PR #823 and PR #831.
Updated references in several sections of the documentation, following the publication of a preprint for the risk matrix score. See PR #827.

Internal Changes#

Tested and added compatibility for recent Xarray versions (2025 and onwards) and adjusted dependency specification so new year “major version” rollovers will be permitted by default in future. See commit #f109f2f and commit #8428d64.
In scores.emerging.weights_from_warning_scaling, changed the name of the argument assessment_weights to evaluation_weights. See PR #806. Note: This is technically a breaking change, but does not trigger a major release as it is contained within the “emerging” section of the API. This area of the API is designated for metrics which are still undergoing peer review and as such are expected to undergo change. Once peer review is concluded, the implementation will be finalised and moved.
Add support for developers of scores who choose to use the pixi tool for environment management. See PR #835, PR #839 and PR #840.

Contributors to this Release#

Dougal T. Squire* (@dougiesquire), Mohammad Mahadi Hasan* (@engrmahadi), Mohammadreza Khanarmuei (@reza-armuei), Nikeeth Ramanathan (@nikeethr) Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Robert J. Taggart (@rob-taggart), Durga Shrestha (@durgals) and Stephanie Chong (@Steph-Chong).

* indicates that this release contains their first contribution to scores.

Version 2.0.0 (December 7, 2024)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Breaking Changes#

The function scores.probability.tw_crps_for_ensemble previously took an optional (mis-spelled) argument chainging_func_kwargs. The spelling has been corrected and the argument is now chaining_func_kwargs. See PR #780 and PR #772.
For those who develop on scores, you will need to update your installation of the scores package with pip install -e .[all], to get updated versions of black, pylint and mypy. See PR #768, PR #769 and PR #771.

Features#

Added three new metrics:
- Brier score for ensembles: scores.probability.brier_score_for_ensemble. See PR #735.
- Negative predictive value: scores.categorical.BasicContingencyManager.negative_predictive_value. See PR #759.
- Positive predictive value: scores.categorical.BasicContingencyManager.positive_predictive_value. See PR #761 and PR #756.
Also added one new emerging metric and two supporting functions:
- Risk matrix score: scores.emerging.risk_matrix_scores.
- Risk matrix score - matrix weights to array: scores.emerging.matrix_weights_to_array.
- Risk matrix score - warning scaling to weight array: scores.emerging.weights_from_warning_scaling. See PR #724 and PR #794.
A new method called format_table was added to the class BasicContingencyManager to improve visualisation of 2x2 contingency tables. The tutorial Binary_Contingency_Scores was updated to demonstrate the use of this function. See PR #775.
The functions scores.processing.comparative_discretise, scores.processing.binary_discretise and scores.processing.binary_discretise_proportion now accept either a string indicating the choice of operator to be used, or an operator from the Python core library operator module. Using one of the operators from the Python core module is recommended, as doing so is more reliable for a variety of reasons. Support for the use of a string may be removed in future. See PR #740 and PR #758.

Documentation#

Added “The Risk Matrix Score” tutorial. See PR #724 and PR #794.
Updated the “Brier Score” tutorial to include a new section about the Brier score for ensembles. See PR #735.
Updated the “Binary Categorical Scores and Binary Contingency Tables (Confusion Matrices)” tutorial:
- Included “positive predictive value” in the list of binary categorical scores.
- Included “negative predictive value” in the list of binary categorical scores.
- Demonstrated the use of the new format_table method for visualising 2x2 contingency tables. See PR #759 and PR #775.
Updated the “Contributing Guide”:
- Added a new section: “Creating Your Own Fork of scores for the First Time”.
- Updated the section: “Workflow for Submitting Pull Requests”.
- Added a new section: “Pull Request Etiquette”. See PR #787.
Updated the README:
- Added a link to a video of a PyCon AU 2024 conference presentation about scores. See PR #783.
- Added a link to the archives of scores on Zenodo. See PR #784.
Added Scoringrules to “Related Works”. See PR #746, PR #766 and PR #789.

Internal Changes#

Removed scikit-learn as a dependency. scores has replaced the use of scikit-learn with a similar function from SciPy (which was an existing scores dependency). This change was manually tested and found to be faster. See PR #774.
Version pinning of dependencies in release files (the wheel and sdist files used by PyPI and conda-forge) is now managed and set by the hatch_build script. This allows development versions to be free-floating, while being more specific about dependencies in releases. The previous process also aimed to do this, but was error-prone. A new entry called pinned_dependencies was added to pyproject.toml to specify the release dependencies. See PR #760.

Contributors to this Release#

Arshia Sharma* (@arshiaar), A.J. Fisher* (@AJTheDataGuy), Liam Bluett* (@lbluett), Jinghan Fu* (@JinghanFu), Sam Bishop* (@techdragon), Robert J. Taggart (@rob-taggart), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Nicholas Loveday (@nicholasloveday).

* indicates that this release contains their first contribution to scores.

Version 1.3.0 (November 15, 2024)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Introduced Support for Python 3.13 and Dropped Support for Python 3.9#

In line with other scientific Python packages, scores has dropped support for Python 3.9 in this release. scores has added support for Python 3.13. See PR #710.

Features#

Added four new metrics:
- Quantile Interval Score: scores.continuous.quantile_interval_score. See PR #704, PR #733 and PR #738.
- Interval Score: scores.continuous.interval_score. See PR #704, PR #733 and PR #738.
- Kling-Gupta Efficiency (KGE): scores.continuous.kge. See PR #679, PR #700 and PR #734.
- Interval threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.interval_tw_crps_for_ensemble. See PR #682 and PR #734.
Added an optional include_components argument to several continuous ranked probability score (CRPS) functions for ensembles. If supplied, the include_components argument will return the underforecast penalty, the overforecast penalty and the forecast spread term, in addition to the overall CRPS value. This applies to the following CRPS functions:
- continuous ranked probability score (CRPS) for ensembles: scores.probability.crps_for_ensemble
- threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.tw_crps_for_ensemble
- tail threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.tail_tw_crps_for_ensemble
- interval threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.interval_tw_crps_for_ensemble) See PR #708 and PR #734.

Documentation#

Added “Kling–Gupta Efficiency (KGE)” tutorial. See PR #679, PR #700 and PR #734.
Added “Quantile Interval Score and Interval Score” tutorial. See PR #704, PR #736 and PR #738.
Added “Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles” tutorial. See PR #706 and PR #722.
Updated the title in the “Binary Categorical Scores and Binary Contingency Tables (Confusion Matrices)” tutorial and the description for the corresponding thumbnail in the tutorial gallery. See PR #741 and PR #743.
Updated the pull request template. See PR #719.

Internal Changes#

Sped up (improved the computational efficiency of) the continuous ranked probability score (CRPS) for ensembles. This also addresses memory issues when a large number of ensemble members are present. See PR #694.

Contributors to this Release#

Mohammadreza Khanarmuei (@reza-armuei), Nicholas Loveday (@nicholasloveday), Durga Shrestha (@durgals), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).

Version 1.2.0 (September 13, 2024)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

Added three new metrics:
- Percent bias (PBIAS): scores.continuous.pbias. See PR #639 and PR #655.
- Threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.tw_crps_for_ensemble. See PR #644.
- Tail threshold weighted continuous ranked probability score (twCRPS) for ensembles: scores.probability.tail_tw_crps_for_ensemble. See PR #644.
The FIxed Risk Multicategorical (FIRM) score (scores.categorical.firm) can now take a sequence of mulitdimensional arrays (xr.DataArray) of thresholds. This allows the FIRM score to be used with categorical thresholds that vary across the domain. See PR #661.

Documentation#

Added information about percent bias to the “Additive Bias and Multiplicative Bias” tutorial. See PR #639 and PR #656.
Updated documentation to say there are now over 60 metrics, statistical techniques and data processing tools contained in scores. See PR #659.
In the “Contributing Guide”, updated instructions for installing a conda-based virtual environment. See PR #654.

Internal Changes#

Modified automated tests to work with NumPy 2.1. Incorporated a union type of array and generic in assert statements for Dask operations. See PR #643.

Contributors to this Release#

Durga Shrestha* (@durgals), Maree Carroll (@mareecarroll), Nicholas Loveday (@nicholasloveday), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).

* indicates that this release contains their first contribution to scores.

Version 1.1.0 (August 9, 2024)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Features#

scores is now available on conda-forge.
Added five new metrics
- threshold weighted squared error: scores.continuous.tw_squared_error
- threshold weighted absolute error: scores.continuous.tw_absolute_error
- threshold weighted quantile score: scores.continuous.tw_quantile_score
- threshold weighted expectile score: scores.continuous.tw_expectile_score
- threshold weighted Huber loss: scores.continuous.tw_huber_loss. See PR #609.

Documentation#

Added “Threshold Weighted Scores” tutorial. See PR #609.
Removed nbviewer link from documentation. See PR #615.

Internal Changes#

Modified numpy.trapezoid call to work with either NumPy 1 or 2. See PR #610.

Contributors to this Release#

Nicholas Loveday (@nicholasloveday), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).

Version 1.0.0 (July 10, 2024)#

We are happy to have reached the point of releasing “Version 1.0.0” of scores. While we look forward to many version increments to come, version 1.0.0 represents a milestone. It signifies a stabilisation of the API, and marks a turning point from the initial construction period. We have also published a paper in the Journal of Open Source Software (see citation further below).

From this point forward, scores will be following the Semantic Versioning Specification (SemVer) in its release management.

This is a good moment to acknowledge and thank the contributors that helped us reach this point. They are: Tennessee Leeuwenburg, Nicholas Loveday, Elizabeth E. Ebert, Harrison Cook, Mohammadreza Khanarmuei, Robert J. Taggart, Nikeeth Ramanathan, Maree Carroll, Stephanie Chong, Aidan Griffiths and John Sharples.

Please consider a citation of our paper if you use our code. The citation is:

Leeuwenburg, T., Loveday, N., Ebert, E. E., Cook, H., Khanarmuei, M., Taggart, R. J., Ramanathan, N., Carroll, M., Chong, S., Griffiths, A., & Sharples, J. (2024). scores: A Python package for verifying and evaluating models and predictions with xarray. Journal of Open Source Software, 9(99), 6889. https://doi.org/10.21105/joss.06889

BibTeX:

@article{Leeuwenburg_scores_A_Python_2024,
author = {Leeuwenburg, Tennessee and Loveday, Nicholas and Ebert, Elizabeth E. and Cook, Harrison and Khanarmuei, Mohammadreza and Taggart, Robert J. and Ramanathan, Nikeeth and Carroll, Maree and Chong, Stephanie and Griffiths, Aidan and Sharples, John},
doi = {10.21105/joss.06889},
journal = {Journal of Open Source Software},
month = jul,
number = {99},
pages = {6889},
title = {{scores: A Python package for verifying and evaluating models and predictions with xarray}},
url = {https://joss.theoj.org/papers/10.21105/joss.06889},
volume = {9},
year = {2024}
}

For a list of all changes in this release, see the full changelog.

Version 0.9.3 (July 9, 2024)#

For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.

Breaking Changes#

Renamed and relocated function scores.continuous.correlation to scores.continuous.correlation.pearsonr. See PR #583.

Documentation#

Added “Dimension Handling” tutorial, which describes reducing and preserving dimensions. See PR #589.
Updated “Detailed Installation Guide” with information on installing kernels in a Jupyter environment. See PR #586 and PR #587.

Internal Changes#

Introduced pinned versions for dependencies on main. See PR #580.

Contributors to this Release#

Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Nicholas Loveday (@nicholasloveday).

Version 0.9.2 (June 26, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.9.1 (June 14, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.9.0 (June 12, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.6 (June 11, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.5 (June 9, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.4 (June 3, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.3 (June 2, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.2 (May 21, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8.1 (May 16, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.8 (May 14, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.7 (May 8, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.6 (April 6, 2024)#

For a list of all changes in this release, see the full changelog.

Note: version 0.6 was initially tagged as “v0.6” and released on 6th April 2024. On 7th April 2024, an identical version was released with the tag “0.6” (i.e. with the “v” ommitted from the tag).

Version 0.5 (April 6, 2024)#

For a list of all changes in this release, see the full changelog.

Version 0.4 (September 15, 2023)#

For a list of all changes in this release, see the full changelog.

Version 0.0.2 (June 9, 2023)#

For a list of all changes in this release, see the full changelog.

Version 0.0.1 (January 16, 2023)#

Version 0.0.1 was released on PyPI as a placeholder, while very early development and package design was being undertaken.

Release Notes (What’s New)

Contents

Release Notes (What’s New)#

Version 2.6.0 (July 17, 2026)#

Features#

Deprecations#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.5.0 (February 14, 2026)#

Features#

Deprecations#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.4.0 (January 14, 2026)#

Features#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.3.0 (October 14, 2025)#

Features#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.2.0 (July 26, 2025)#

Features#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.1.0 (April 30, 2025)#

Features#

Documentation#

Internal Changes#

Contributors to this Release#

Version 2.0.0 (December 7, 2024)#

Breaking Changes#

Features#

Documentation#

Internal Changes#

Contributors to this Release#

Version 1.3.0 (November 15, 2024)#

Introduced Support for Python 3.13 and Dropped Support for Python 3.9#

Features#

Documentation#

Internal Changes#

Contributors to this Release#

Version 1.2.0 (September 13, 2024)#

Features#

Documentation#

Internal Changes#

Contributors to this Release#

Version 1.1.0 (August 9, 2024)#

Features#

Documentation#

Internal Changes#

Contributors to this Release#

Version 1.0.0 (July 10, 2024)#

Version 0.9.3 (July 9, 2024)#

Breaking Changes#

Documentation#

Internal Changes#

Contributors to this Release#

Version 0.9.2 (June 26, 2024)#

Version 0.9.1 (June 14, 2024)#

Version 0.9.0 (June 12, 2024)#

Version 0.8.6 (June 11, 2024)#

Version 0.8.5 (June 9, 2024)#

Version 0.8.4 (June 3, 2024)#

Version 0.8.3 (June 2, 2024)#

Version 0.8.2 (May 21, 2024)#

Version 0.8.1 (May 16, 2024)#

Version 0.8 (May 14, 2024)#

Version 0.7 (May 8, 2024)#

Version 0.6 (April 6, 2024)#