make_table¶
- hydrostats.analyze.make_table(merged_dataframe: DataFrame, metrics: Sequence[str], seasonal_periods: Sequence[tuple[str, str]] | None = None, mase_m: int = 1, dmod_j: float = 1, nse_mod_j: float = 1, h6_mhe_k: float = 1, h6_ahe_k: float = 1, h6_rmshe_k: float = 1, d1_p_obs_bar_p: float | None = None, lm_x_obs_bar_p: float | None = None, kge2009_s: tuple[float, float, float] = (1, 1, 1), kge2012_s: tuple[float, float, float] = (1, 1, 1), replace_nan: float | None = None, replace_inf: float | None = None, remove_neg: bool = False, remove_zero: bool = False, location: str | None = None) DataFrame¶
Create a table of user-selected metrics with optional seasonal analysis.
Creates a table with metrics as specified by the user. Seasonal periods can also be specified to compare different seasons and how well the simulated data matches the observed data. Has options to save the table to either a csv or an Excel workbook. Can Also add a column for the location of the data.
- Parameters:
merged_dataframe – A pandas dataframe that has two columns of predicted data (Col 0) and observed data (Col 1) with a datetime index.
metrics – A list of all the metrics that the user wants to calculate. The metrics abbreviations must be used (e.g. the abbreviation for the mean error is “ME”. Each function has an attribute with the name and abbreviation, so this can be used instead (see example). Also, strings can be typed and found in the quick reference table in this documentation.
seasonal_periods – If given, specifies the seasonal periods that the user wants to analyze (e.g. [[‘06-01’, ‘06-30’], [‘08-12’, ‘11-23’]] would analyze the dates from June 1st to June 30th and also August 8th to November 23). Note that the entire time series is analyzed with the selected metrics by default.
mase_m – Parameter for the mean absolute scaled error (MASE) metric.
dmod_j – Parameter for the modified index of agreement (dmod) metric.
nse_mod_j – Parameter for the modified Nash-Sutcliffe (nse_mod) metric.
h6_mhe_k – Parameter for the H6 (MHE) metric.
h6_ahe_k – Parameter for the H6 (AHE) metric
h6_rmshe_k – Parameter for the H6 (RMSHE) metric
d1_p_obs_bar_p – Parameter fot the Legate McCabe Index of Agreement (d1_p).
lm_x_obs_bar_p – Parameter for the Lagate McCabe Efficiency Index (lm_index).
kge2009_s – A tuple of floats of length three signifying how to weight the three values used in the Kling Gupta (2009) metric.
kge2012_s – A tuple of floats of length three signifying how to weight the three values used in the Kling Gupta (2012) metric.
replace_nan – If given, indicates which value to replace NaN values with in the two arrays. If None, when a NaN value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
replace_inf – If given, indicates which value to replace Inf values with in the two arrays. If None, when an inf value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
remove_neg – If True, when a negative value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
remove_zero – If true, when a zero value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
location – The name of the location that will be created as a column in the table that is created. Useful for creating a large table with different datasets.
- Returns:
Dataframe with rows containing the metric values at the different time ranges, and columns
containing the metrics specified.
Notes
If desired, users can export the tables to a CSV or Excel Workbook. This can be done using the built-in methods of pandas. A link to the CSV method can be found at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html, and a link to the Excel method can be found at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html
Examples
First, we need to get some data. The data here is pulled from the Streamflow Predication Tool model and the ECMWF forecasting model. We are comparing the two models in this example.
>>> import hydrostats.analyze as ha >>> import hydrostats.data as hd >>> from hydrostats.metrics import mae, r_squared, nse, kge_2012 >>> >>> # Defining the URLs of the datasets >>> sfpt_url = r"https://github.com/waderoberts123/Hydrostats/raw/master/Sample_data/sfpt_data/magdalena-calamar_interim_data.csv" >>> glofas_url = r"https://github.com/waderoberts123/Hydrostats/raw/master/Sample_data/GLOFAS_Data/magdalena-calamar_ECMWF_data.csv" >>> # Merging the data >>> merged_df = hd.merge_data(sfpt_url, glofas_url, column_names=("SFPT", "GLOFAS"))
Here we make a table and print the results:
>>> my_metrics = [ ... mae.abbr, ... r_squared.abbr, ... nse.abbr, ... kge_2012.abbr, ... ] # HydroErr 1.24 or greater is required to use these properties >>> seasonal = [ ... ["01-01", "03-31"], ... ["04-01", "06-30"], ... ["07-01", "09-30"], ... ["10-01", "12-31"], ... ] >>> table = ha.make_table( ... merged_df, ... my_metrics, ... seasonal, ... remove_neg=True, ... remove_zero=True, ... location="Magdalena", ... ) >>> table Location MAE ... NSE KGE (2012) Full Time Series Magdalena 1157.669988 ... 0.873684 0.872871 January-01:March-31 Magdalena 631.984177 ... 0.861163 0.858187 April-01:June-30 Magdalena 1394.640050 ... 0.813737 0.876890 July-01:September-30 Magdalena 1188.542871 ... 0.829492 0.831188 October-01:December-31 Magdalena 1410.852917 ... 0.793927 0.791257