make_table

hydrostats.analyze.make_table(merged_dataframe, metrics, seasonal_periods=None, mase_m=1, dmod_j=1, nse_mod_j=1, h6_mhe_k=1, h6_ahe_k=1, h6_rmshe_k=1, d1_p_obs_bar_p=None, lm_x_obs_bar_p=None, kge2009_s=(1, 1, 1), kge2012_s=(1, 1, 1), replace_nan=None, replace_inf=None, remove_neg=False, remove_zero=False, location=None)

Create a table of user selected metrics with optional seasonal analysis.

Creates a table with metrics as specified by the user. Seasonal periods can also be specified in order to compare different seasons and how well the simulated data matches the observed data. Has options to save the table to either a csv or an excel workbook. Also has an option to add a column for the location of the data.

Parameters:
  • merged_dataframe (DataFrame) – A pandas dataframe that has two columns of predicted data (Col 0) and observed data (Col 1) with a datetime index.
  • metrics (list of str) – A list of all the metrics that the user wants to calculate. The metrics abbreviations must be used (e.g. the abbreviation for the mean error is “ME”. Each function has an attribute with the name and abbreviation, so this can be used instead (see example). Also, strings can be typed and found in the quick reference table in this documentation.
  • seasonal_periods (2D list of str, optional) – If given, specifies the seasonal periods that the user wants to analyze (e.g. [[‘06-01’, ‘06-30’], [‘08-12’, ‘11-23’]] would analyze the dates from June 1st to June 30th and also August 8th to November 23). Note that the entire time series is analyzed with the selected metrics by default.
  • mase_m (int, Optional) – Parameter for the mean absolute scaled error (MASE) metric.
  • dmod_j (int or float, optional) – Parameter for the modified index of agreement (dmod) metric.
  • nse_mod_j (int or float, optional) – Parameter for the modified Nash-Sutcliffe (nse_mod) metric.
  • h6_mhe_k (int or float, optional) – Parameter for the H6 (MHE) metric.
  • h6_ahe_k (int or float, optional) – Parameter for the H6 (AHE) metric
  • h6_rmshe_k (int or float, optional) – Parameter for the H6 (RMSHE) metric
  • d1_p_obs_bar_p (float, optional) – Parameter fot the Legate McCabe Index of Agreement (d1_p).
  • lm_x_obs_bar_p (float, optional) – Parameter for the Lagate McCabe Efficiency Index (lm_index).
  • kge2009_s (tuple of floats) – A tuple of floats of length three signifying how to weight the three values used in the Kling Gupta (2009) metric.
  • kge2012_s (tuple of floats) – A tuple of floats of length three signifying how to weight the three values used in the Kling Gupta (2012) metric.
  • replace_nan (float, optional) – If given, indicates which value to replace NaN values with in the two arrays. If None, when a NaN value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
  • replace_inf (float, optional) – If given, indicates which value to replace Inf values with in the two arrays. If None, when an inf value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
  • remove_neg (boolean, optional) – If True, when a negative value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
  • remove_zero (boolean, optional) – If true, when a zero value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
  • location (str) – The name of the location that will be created as a column in the table that is created. Useful for creating a large table with different datasets.
Returns:

Dataframe with rows containing the metric values at the different time ranges, and columns containing the metrics specified.

Return type:

DataFrame

Notes

If desired, users can export the tables to a CSV or Excel Workbook. This can be done using the built in methods of pandas. A link to CSV method can be found at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html and a link to the Excel method can be found at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Examples

First we need to get some data. The data here is pulled from the Streamflow Predication Tool model and the ECMWF forecasting model. We are comparing the two models in this example.

>>> import hydrostats.analyze as ha
>>> import hydrostats.data as hd
>>> from hydrostats.metrics import mae, r_squared, nse, kge_2012
>>>
>>> # Defining the URLs of the datasets
>>> sfpt_url = r'https://github.com/waderoberts123/Hydrostats/raw/master/Sample_data/sfpt_data/magdalena-calamar_interim_data.csv'
>>> glofas_url = r'https://github.com/waderoberts123/Hydrostats/raw/master/Sample_data/GLOFAS_Data/magdalena-calamar_ECMWF_data.csv'
>>> # Merging the data
>>> merged_df = hd.merge_data(sfpt_url, glofas_url, column_names=('SFPT', 'GLOFAS'))

Here we make a table and print the results:

>>> my_metrics = [mae.abbr, r_squared.abbr, nse.abbr, kge_2012.abbr]  # HydroErr 1.24 or greater is required to use these properties
>>> seasonal = [['01-01', '03-31'], ['04-01', '06-30'], ['07-01', '09-30'], ['10-01', '12-31']]
>>> table = ha.make_table(merged_df, my_metrics, seasonal, remove_neg=True, remove_zero=True, location='Magdalena')
>>> table
                         Location          MAE     ...           NSE  KGE (2012)
Full Time Series        Magdalena  1157.669988     ...      0.873684    0.872871
January-01:March-31     Magdalena   631.984177     ...      0.861163    0.858187
April-01:June-30        Magdalena  1394.640050     ...      0.813737    0.876890
July-01:September-30    Magdalena  1188.542871     ...      0.829492    0.831188
October-01:December-31  Magdalena  1410.852917     ...      0.793927    0.791257