Data Sources#

Introduction#

All metrics, statistical techniques and data processing tools in scores work with xarray. Some metrics work with pandas.

As such, scores works with any data source for which xarray or pandas can be used.

Users will need to supply the dataset(s) they wish to work with, as scores does not contain datasets.

Data referred to on this page is available under various licenses, and the onus is on the user to understand the conditions of those licenses.

For additional information about downloading and preparing sample data, see this tutorial.

Working with Different File Formats#

Working with GRIB Data#

To use scores with GRIB data, install cfgrib and use engine='cfgrib' when opening a GRIB file with xarray.

Working with NetCDF Data#

To use scores with NetCDF or HDF5 data, install h5netcdf. The h5netcdf library is included in the scores “all” and “tutorial” installation options. Opening NetCDF data is demonstrated in this tutorial.

Weather and Climate Data#

This section provides a brief overview of some commonly used weather and climate datasets, and software packages for accessing such data. All datasets and software packages listed below are available free of charge.

Datasets#

Gridded Global Numerical Weather Prediction Data#

Global numerical weather prediction (NWP) models are used to generate medium range forecasts and provide the initial and boundary conditions for higher-resolution regional models. Their global coverage makes them a good starting point for demonstrating the application of scoring methods in any region of interest.

Archived datasets are available for:

Point-Based Data#

Point-based observations (e.g. from weather stations or buoys) are shared routinely between countries for the purposes of weather modelling.

The NOAA Integrated Surface Database (ISD) provides hourly point-based (in-situ) weather station data globally. It is a good starting point for understanding how to work with point-based data. For more information about the NOAA ISD see https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database.

Gridded Model Reanalysis Data#

Reanalysis datasets provide a reliable and detailed reconstruction of past weather and climate conditions, spanning years if not decades.

The ECMWF Reanalysis v5 (ERA5) dataset is a well known and widely used global reanalysis dataset. For more information see https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5. ERA5 is also included in WeatherBench 2, see this section of their documentation.

Gridded Radar (Observation) Data#

Radar data provides remotely sensed precipitation estimates at high spatial and temporal resolution. Radar data varies according to region and is not a globally standardised dataset.

Information on Australian radar data can be found at https://www.openradar.io/.

Software for Accessing Data#

CliMetLab#

The European Centre for Medium-Range Weather Forecasts (ECMWF) has developed the CliMetLab Python package to simplify access to a large range of climatological and meteorological datasets. See https://climetlab.readthedocs.io/.

WeatherBench 2#

WeatherBench 2 provides a framework for evaluating and comparing a range of machine learning (ML) and physics-based weather forecasting models. It includes ground-truth and baseline datasets (including ERA5), and code for evaluating models. The website includes scorecards measuring the skill of ML and physics-based models. For more information see https://sites.research.google/weatherbench/ and google-research/weatherbench2.