auroc¶
-
hydrostats.ens_metrics.
auroc
(fcst_ens=None, obs=None, threshold=None, ens_threshold=None, obs_threshold=None, fcst_ens_bin=None, obs_bin=None)¶ Calculates Area Under the Relative Operating Characteristic curve (AUROC) for a forecast and its verifying binary observation, and estimates the variance of the AUROC
Range: 0 ≤ AUROC ≤ 1, Higher is better.
Parameters: - obs (1D ndarray) – Array of observations for each start date.
- fcst_ens (2D ndarray) – Array of ensemble forecast of dimension n x M, where n = number of start dates and M = number of ensemble members.
- threshold (float) – The threshold for an event (e.g. if the event is a 100 year flood, the streamflow value that a 100 year flood would have to exceed.
- ens_threshold (float) – If different threshholds for the ensemble forecast and the observed data is desired, then this parameter can be set along with the ‘obs_threshold’ parameter to set different thresholds.
- obs_threshold (float) – If different threshholds for the ensemble forecast and the observed data is desired, then this parameter can be set along with the ‘ens_threshold’ parameter to set different thresholds.
- fcst_ens_bin (1D ndarray) – Binary array of observations for each start date. 1 for an event and 0 for a non-event.
- obs_bin (2D ndarray) – Binary array of ensemble forecast of dimension n x M, where n = number of start dates and M = number of ensemble members. 1 for an event and 0 for a non-event.
Notes
NaN and inf treatment: If any value in obs or fcst_ens is NaN or inf, then the corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted. A warning will be shown that informs the user of the rows that have been removed.
Returns: An array of two elements, the AUROC and the estimated variance, respectively. Return type: 1D ndarray Examples
>>> import numpy as np >>> import hydrostats.ens_metrics as em >>> np.random.seed(3849590438)
Creating an observed 1D array and an ensemble 2D array with all random numbers
>>> ens_array_random = (np.random.rand(100, 52) + 1) * 100 >>> obs_array_random = (np.random.rand(100) + 1) * 100
Creating an observed 1D array and an ensemble 2D array with noise.
>>> noise = np.random.normal(scale=1, size=(100, 52)) >>> x = np.linspace(1, 10, 100) >>> observed_array = np.sin(x) + 10 >>> ensemble_array_noise = (np.ones((100, 52)).T * observed_array).T + noise
Calculating the ROC with random values. Note that the area under the curve is close to 0.5 because the data is random.
>>> print(em.auroc(obs=obs_array_random, fcst_ens=ens_array_random, threshold=175)) [0.45560516 0.06406262]
Calculating the ROC with noise in the forecast values. Note that the ROC value is high because the forecast is more accurate.
>>> print(em.auroc(obs=observed_array, fcst_ens=ensemble_array_noise, threshold=10)) [0.99137931 0.00566026]
References
- DeLong et al (1988): Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics. doi: 10.2307/2531595
- Sun and Xu (2014): Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves. IEEE Sign Proc Let 21(11). doi: 10.1109/LSP.2014.2337313
- Stefan Siegert (2017). SpecsVerification: Forecast Verification Routines for Ensemble Forecasts of Weather and Climate. R package version 0.5-2. https://CRAN.R-project.org/package=SpecsVerification