auroc

hydrostats.ens_metrics.auroc(fcst_ens=None, obs=None, threshold=None, ens_threshold=None, obs_threshold=None, fcst_ens_bin=None, obs_bin=None)

Calculates Area Under the Relative Operating Characteristic curve (AUROC) for a forecast and its verifying binary observation, and estimates the variance of the AUROC

Range: 0 ≤ AUROC ≤ 1, Higher is better.

Parameters:
  • obs (1D ndarray) – Array of observations for each start date.
  • fcst_ens (2D ndarray) – Array of ensemble forecast of dimension n x M, where n = number of start dates and M = number of ensemble members.
  • threshold (float) – The threshold for an event (e.g. if the event is a 100 year flood, the streamflow value that a 100 year flood would have to exceed.
  • ens_threshold (float) – If different threshholds for the ensemble forecast and the observed data is desired, then this parameter can be set along with the ‘obs_threshold’ parameter to set different thresholds.
  • obs_threshold (float) – If different threshholds for the ensemble forecast and the observed data is desired, then this parameter can be set along with the ‘ens_threshold’ parameter to set different thresholds.
  • fcst_ens_bin (1D ndarray) – Binary array of observations for each start date. 1 for an event and 0 for a non-event.
  • obs_bin (2D ndarray) – Binary array of ensemble forecast of dimension n x M, where n = number of start dates and M = number of ensemble members. 1 for an event and 0 for a non-event.

Notes

NaN and inf treatment: If any value in obs or fcst_ens is NaN or inf, then the corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted. A warning will be shown that informs the user of the rows that have been removed.

Returns:An array of two elements, the AUROC and the estimated variance, respectively.
Return type:1D ndarray

Examples

>>> import numpy as np
>>> import hydrostats.ens_metrics as em
>>> np.random.seed(3849590438)

Creating an observed 1D array and an ensemble 2D array with all random numbers

>>> ens_array_random = (np.random.rand(100, 52) + 1) * 100
>>> obs_array_random = (np.random.rand(100) + 1) * 100

Creating an observed 1D array and an ensemble 2D array with noise.

>>> noise = np.random.normal(scale=1, size=(100, 52))
>>> x = np.linspace(1, 10, 100)
>>> observed_array = np.sin(x) + 10
>>> ensemble_array_noise = (np.ones((100, 52)).T * observed_array).T + noise

Calculating the ROC with random values. Note that the area under the curve is close to 0.5 because the data is random.

>>> print(em.auroc(obs=obs_array_random, fcst_ens=ens_array_random, threshold=175))
[0.45560516 0.06406262]

Calculating the ROC with noise in the forecast values. Note that the ROC value is high because the forecast is more accurate.

>>> print(em.auroc(obs=observed_array, fcst_ens=ensemble_array_noise, threshold=10))
[0.99137931 0.00566026]

References

  • DeLong et al (1988): Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics. doi: 10.2307/2531595
  • Sun and Xu (2014): Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Sign Proc Let 21(11). doi: 10.1109/LSP.2014.2337313
  • Stefan Siegert (2017). SpecsVerification: Forecast Verification Routines for Ensemble Forecasts of Weather and Climate. R package version 0.5-2. https://CRAN.R-project.org/package=SpecsVerification