Skip to content

Commit

Permalink
add examples to the ams_analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
MAfarrag committed Aug 18, 2024
1 parent 991b1fe commit 00ff8eb
Show file tree
Hide file tree
Showing 21 changed files with 268 additions and 61 deletions.
55 changes: 55 additions & 0 deletions examples/data/ams-gauges.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
date,Frankfurt,Mainz,Kaub,Andernach,Cologne,Rees
1951,-9,4250,4480,6080,6490,6830
1952,-9,4490,4610,6970,7110,7340
1953,-9,4270,4380,7300,7610,7970
1954,-9,2850,2910,3440,3620,3840
1955,-9,5940,6050,9460,9460,9500
1956,-9,5000,5150,7140,7270,7540
1957,-9,4500,4520,6650,6750,6950
1958,-9,5020,5260,8480,8600,9080
1959,-9,3050,3180,4950,5100,5420
1960,-9,2560,2660,3310,3400,3550
1961,-9,2940,3220,5000,5250,5840
1962,-9,3460,3620,5270,5460,5880
1963,-9,2360,2550,3300,3540,3820
1964,432,2780,3090,5160,5480,5510
1965,913,4430,4460,5890,6130,6350
1966,967,4430,4560,6960,7410,7720
1967,962,3950,4240,6920,7290,7570
1968,1100,4130,4280,6970,7160,7520
1969,797,3370,3540,5130,5270,5340
1970,1760,6630,6670,9990,9690,9900
1971,440,2410,2510,3220,3520,3810
1972,292,2100,2040,2220,2270,2330
1973,347,3680,3690,5220,5380,5290
1974,470,2940,2910,3880,4050,4260
1975,834,3740,3740,5440,5750,6190
1976,447,2150,2180,3100,3270,3560
1977,619,4120,4320,6460,6680,6560
1978,540,5110,5310,6200,6340,6330
1979,882,4490,4600,6580,6730,6900
1980,1240,5470,5630,8450,8800,8760
1981,1160,4490,4553,6271,6130,6500
1982,1490,5480,5600,7793,7967,7787
1983,890,5770,6090,9570,9801,9868
1984,1230,4520,4880,7980,8433,8502
1985,548,3040,3230,4420,4880,4780
1986,795,3850,3980,5860,5890,6210
1987,1280,4670,4870,6960,7130,7590
1988,1760,6920,7160,9250,9550,10200
1989,723,3480,3680,5420,5300,5480
1990,830,4820,5130,7450,7250,6970
1991,673,3400,3710,6190,6190,6590
1992,489,3750,3930,5230,5210,5440
1993,500,3640,3780,5400,5480,5810
1994,1220,5490,6310,10400,10600,10600
1995,1990,5900,6520,10100,10700,11300
1996,669,3760,3860,4480,4280,4250
1997,914,4120,4210,6960,7080,6970
1998,1060,4720,4790,6910,6700,6150
1999,1420,5480,5730,8160,8530,9240
2000,625,3750,3900,6390,6370,6550
2001,1140,5420,5710,8320,8410,8410
2002,1170,4950,5140,7260,7240,7940
2003,1800,5090,5350,8620,8840,9470
2004,197,1150,1190,1470,1580,1810
7 changes: 7 additions & 0 deletions examples/data/distribution_properties.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
id,c,loc,scale,D-static,P-Value
Frankfurt,0.0519,718.7208,376.1886,0.0732,0.9999
Mainz,0.3073,3743.8060,1214.6170,0.0556,1.0000
Kaub,0.2826,3881.5735,1262.4261,0.0556,1.0000
Andernach,0.3215,5649.0760,2084.3831,0.0741,0.9987
Cologne,0.3061,5783.0175,2090.2240,0.0741,0.9987
Rees,0.2842,5960.0225,2107.1972,0.0741,0.9987
Binary file added examples/data/gauges/figures/Andernach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/Cologne.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/Frankfurt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/Kaub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/Mainz.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/Rees.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/date.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Andernach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Cologne.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Frankfurt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Kaub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Mainz.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/data/gauges/figures/f-Rees.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions examples/data/statistical_properties.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
id,mean,std,min,5%,25%,median,75%,95%,max,start_year,end_year,nyr,q1.5,q2,q5,q10,q25,q50,q100,q200,q500,q1000
Frankfurt,917.4390,433.9829,197.0000,347.0000,548.0000,882.0000,1170.0000,1760.0000,1990.0000,1964.0000,2004.0000,40.0000,683.2546,855.2969,1261.5965,1517.7588,1827.4881,2047.6221,2047.6221,2258.3329,2460.8234,2717.0370
Mainz,4153.3333,1181.7078,1150.0000,2286.5000,3415.0000,4190.0000,4987.5000,5914.0000,6920.0000,1951.0000,2004.0000,53.0000,3627.9072,4164.8247,5203.5025,5716.9056,6217.2439,6504.7768,6504.7768,6734.8834,6919.9487,7110.7671
Kaub,4327.0926,1243.0196,1190.0000,2394.5000,3635.0000,4350.0000,5147.5000,6383.5000,7160.0000,1951.0000,2004.0000,53.0000,3761.2533,4321.1147,5425.0055,5983.7382,6539.6898,6865.8500,6865.8500,7131.4309,7348.7381,7577.2635
Andernach,6333.4074,2016.2113,1470.0000,3178.0000,5175.0000,6425.0000,7412.5000,9717.0000,10400.0000,1951.0000,2004.0000,53.0000,5450.0504,6369.7349,8129.5379,8987.5813,9813.8563,10283.0847,10283.0847,10654.8745,10950.9409,11252.7701
Cologne,6489.2778,2037.0057,1580.0000,3354.5000,5277.5000,6585.0000,7560.0000,9728.8500,10700.0000,1951.0000,2004.0000,53.0000,5583.5790,6507.6947,8296.9962,9182.3978,10046.1020,10542.9299,10542.9299,10940.8513,11261.1394,11591.6871
Rees,6701.4259,2074.9944,1810.0000,3556.5000,5450.0000,6575.0000,7901.7500,10005.0000,11300.0000,1951.0000,2004.0000,53.0000,5759.1727,6693.4716,8533.3085,9463.0691,10386.9206,10928.1713,10928.1713,11368.3842,11728.1679,12106.0276
210 changes: 178 additions & 32 deletions statista/eva.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,35 @@
"""Extreme value analysis."""
"""Extreme value analysis.
Annual Maximum Series (AMS) Analysis is a statistical method commonly used in fields like hydrology, meteorology, and
environmental engineering to analyze extreme events, such as floods, rainfall, or temperatures. The primary goal of AMS
analysis is to assess the frequency and magnitude of extreme events over time.
Key Concepts of AMS Analysis
Definition:
The Annual Maximum Series is a time series composed of the maximum values observed within each year. For example,
in hydrology, the AMS might consist of the highest daily flow recorded in each year for a river.
Purpose:
The AMS is used to model and predict the probability of extreme events occurring in the future. This is crucial for
risk assessment and the design of infrastructure to withstand such events (e.g., dams, levees, drainage systems).
Advantages of AMS Analysis
- Simplicity: AMS analysis is straightforward and focuses on extreme events, which are often of primary interest.
- Historical Context: Provides insights based on historical extreme values, which are directly relevant for
planning and design.
Limitations of AMS Analysis
- Data Limitations: The accuracy of AMS analysis depends on the availability and quality of long-term data.
- Ignores Sub-Annual Events: AMS considers only one value per year, potentially ignoring significant events that
occur more than once in a year.
Common Applications:
- Flood Frequency Analysis: AMS is often used to estimate the probability of extreme flood events to help design
flood control infrastructure.
- Rainfall Analysis: Used to assess the risk of extreme rainfall events for urban drainage design.
- Temperature Extremes: AMS can be used to evaluate the risk of extremely high or low temperatures.
"""

from typing import Union, Tuple
from pathlib import Path
Expand Down Expand Up @@ -31,29 +62,29 @@ def ams_analysis(
Parameters
----------
time_series_df : [DataFrame]
time_series_df: DataFrame
DataFrame containing multiple time series to do the statistical analysis on.
ams: [bool]
ams: bool
True if the given time series is annual mean series. Default is False.
ams_start: [str]
ams_start: str
The beginning of the year which is used to resample the time series to get the annual maximum series.
Default is "A-OCT".
save_plots : [Bool]
save_plots: bool
True if you want to save the plots.
save_to : [str]
save_to: str
The rdir where you want to save the statistical properties.
filter_out: [Bool]
filter_out: bool
For observed or hydraulic model data it has gaps of times where the model did not run or gaps in the observed
data if these gap days are filled with a specific value and you want to ignore it here
give filter_out = Value you want
distribution: str, Default is "GEV".
distribution name.
method: str, Default is "lmoments".
available methods are 'mle', 'mm', 'lmoments', 'optimization'.
obj_func: [callable]
obj_func: callable
objective function to be used in the optimization method, default is None. for Gumbel distribution there is the
`Gumbel.objective_fn` and similarly for the GEV distribution there is the GEV.objective_fn.
quartile: [float]
quartile: float
the quartile is only used when estimating the distribution parameters based on optimization and a threshould
value, the threshold value will be calculated as the quartile coresponding to the value of this parameter.
alpha: float, optional, Default is [0.1].
Expand Down Expand Up @@ -86,6 +117,111 @@ def ams_analysis(
Cologne,0.3,5783.0,2090.2,0.1,1.0
Rees,0.3,5960.0,2107.2,0.1,1.0
date,0.3,1971.8,16.2,0.1,1.0
Examples
--------
- First read the data as `pandas.DataFrame`.
>>> import pandas as pd
>>> ams_gauges = pd.read_csv(f"examples/data/ams-gauges.csv", index_col=0)
>>> print(ams_gauges) # doctest: +SKIP
Frankfurt Mainz Kaub Andernach Cologne Rees
date
1951 -9 4250 4480 6080 6490 6830
1952 -9 4490 4610 6970 7110 7340
1953 -9 4270 4380 7300 7610 7970
1954 -9 2850 2910 3440 3620 3840
1955 -9 5940 6050 9460 9460 9500
1956 -9 5000 5150 7140 7270 7540
1957 -9 4500 4520 6650 6750 6950
.....
1998 1060 4720 4790 6910 6700 6150
1999 1420 5480 5730 8160 8530 9240
2000 625 3750 3900 6390 6370 6550
2001 1140 5420 5710 8320 8410 8410
2002 1170 4950 5140 7260 7240 7940
2003 1800 5090 5350 8620 8840 9470
2004 197 1150 1190 1470 1580 1810
- The time series data we have just read are the annual maximum series of the gauges, the first column is an
index of the year (54 years in total) and the rest are dischate values in m3/s for each the station. a value 0f
"-9" is used to fill the missing data.
- The `ams_analysis` function takes the time series `DataFrame` as the first and only positional argument,
all the other arguments are optional. Since the time series is annual maximum series already, so we don't
need the function to do any resampling, we set `ams=True`. The `ams_start` could be used to provide the
beginning of the year to resample the time series to ams (i.e., `ams_start = "A-OCT"`).
- We want to save the plots, so we set `save_plots=True` and provide the directory where we want to save the plots in
`save_to`.
- We also want to filter out the missing data, so we set `filter_out=-9`.
- In order to fit the time series to a distribution we also to provide the parameter estimation method (i.e.,
`lmoments`, `mle`, `mm`, `optimization`), the default is the `lmoments`, and you need to provide the name of
the distribution you want to fit the time series to (i.e., `GEV`, `Gumbel`). So for that we
will use `method="lmoments"`, and `distribution="GEV"`.
- The `alpha` is the significance level of the confidence interval, the default is 0.1. The `alpha` parameter is
necessary for the confidence interval calculation.
>>> method = "lmoments"
>>> save_to = "examples/data/gauges"
>>> statistical_properties, distribution_properties = ams_analysis(
... time_series_df=ams_gauges,
... ams=True,
... save_plots=True,
... save_to=save_to,
... filter_out=-9,
... method=method,
... alpha=0.05,
... ) # doctest: +SKIP
-----KS Test--------
Statistic = 0.07317073170731707
Accept Hypothesis
P value = 0.9999427584427157
-----KS Test--------
Statistic = 0.07317073170731707
Accept Hypothesis
P value = 0.9999427584427157
2024-08-18 12:45:04.779 | DEBUG | statista.confidence_interval:boot_strap:104 - Some values used top 10 low/high samples; results may be unstable.
2024-08-18 12:45:05.221 | INFO | statista.eva:ams_analysis:300 - Gauge Frankfurt done.
- The `ams_analysis` function will iterate over all the gauges in the time series and fit the time series to the
distribution and calculate the statistical properties and the distribution properties of the fitted distribution.
- One of the outputs of the function is the statistical properties of the time series, which includes the mean, std,
min, and some quantile (5%, 25%, ..., 95%, max).
>>> print(statistical_properties.loc[:, statistical_properties.columns[:9]]) # doctest: +SKIP
mean std min 5% 25% median 75% 95% max
id
Frankfurt 917.439024 433.982918 197.0 347.00 548.00 882.0 1170.00 1760.00 1990.0
Mainz 4153.333333 1181.707804 1150.0 2286.50 3415.00 4190.0 4987.50 5914.00 6920.0
Kaub 4327.092593 1243.019565 1190.0 2394.50 3635.00 4350.0 5147.50 6383.50 7160.0
Andernach 6333.407407 2016.211257 1470.0 3178.00 5175.00 6425.0 7412.50 9717.00 10400.0
Cologne 6489.277778 2037.005658 1580.0 3354.50 5277.50 6585.0 7560.00 9728.85 10700.0
Rees 6701.425926 2074.994365 1810.0 3556.50 5450.00 6575.0 7901.75 10005.00 11300.0
- The rest of the columns in the `statistical_properties` are start_year, end_year, nyr, q1.5, q2, q5, q10, q25,
q50, q100, q200, q500, q1000, which are the return periods of the fitted distribution.
>>> print(statistical_properties.loc[:, statistical_properties.columns[9:]]) # doctest: +SKIP
start_year end_year nyr q1.5 q2 ... q200 q500 q1000
id
Frankfurt 1964.0 2004.0 40.0 683.254634 855.296864 ... 2258.332886 2460.823383 2717.037039
Mainz 1951.0 2004.0 53.0 3627.907224 4164.824744 ... 6734.883442 6919.948680 7110.767115
Kaub 1951.0 2004.0 53.0 3761.253314 4321.114689 ... 7131.430892 7348.738113 7577.263513
Andernach 1951.0 2004.0 53.0 5450.050443 6369.734950 ... 10654.874462 10950.940916 11252.770123
Cologne 1951.0 2004.0 53.0 5583.579049 6507.694660 ... 10940.851299 11261.139356 11591.687060
Rees 1951.0 2004.0 53.0 5759.172691 6693.471602 ... 11368.384249 11728.167908 12106.027638
- The other output is the distribution properties of the fitted distribution, which includes the shape, location, and
scale parameters of the fitted distribution, plus the D-static and P-Value of the KS test.
>>> print(distribution_properties) # doctest: +SKIP
c loc scale D-static P-Value
id
Frankfurt 0.051852 718.720761 376.188608 0.073171 0.999943
Mainz 0.307295 3743.806013 1214.617042 0.055556 0.999998
Kaub 0.282580 3881.573477 1262.426086 0.055556 0.999998
Andernach 0.321513 5649.076008 2084.383132 0.074074 0.998738
Cologne 0.306146 5783.017454 2090.224037 0.074074 0.998738
Rees 0.284227 5960.022503 2107.197210 0.074074 0.998738
"""
gauges = time_series_df.columns.tolist()
# List of the table output, including some general data and the return periods.
Expand All @@ -99,8 +235,8 @@ def ams_analysis(
"75%",
"95%",
"max",
"t_beg",
"t_end",
"start_year",
"end_year",
"nyr",
]
rp_name = [
Expand All @@ -119,7 +255,7 @@ def ams_analysis(

# In a table where duplicates are removed (np.unique), find the number of
# gauges contained in the .csv file.
# Declare a dataframe for the output file, with as index the gaugne numbers
# Declare a dataframe for the output file, with as index the gauge numbers
# and as columns all the output names.
statistical_properties = pd.DataFrame(np.nan, index=gauges, columns=col_csv)
statistical_properties.index.name = "id"
Expand Down Expand Up @@ -151,21 +287,24 @@ def ams_analysis(
rpath.mkdir(parents=True, exist_ok=True)

for i in gauges:
q_ts = time_series_df.loc[:, i]
q_ts = time_series_df.loc[:, i].to_frame()
# The time series is resampled to the annual maxima, and turned into a numpy array.
# The hydrological year is 1-Nov/31-Oct (from Petrow and Merz, 2009, JoH).
if not ams:
ams_df = q_ts.resample(ams_start).max().values
ams_df = q_ts.resample(ams_start).max()
ams_arr = ams_df.values
else:
ams_df = q_ts.values
ams_df = q_ts
ams_arr = q_ts.values

if filter_out is not None:
ams_df = ams_df[ams_df != filter_out]
ams_df = ams_df.loc[ams_df[ams_df.columns[0]] != filter_out, :]
ams_arr = ams_arr[ams_arr != filter_out]

dist = Distributions(distribution, data=ams_df)
dist = Distributions(distribution, data=ams_arr)
# estimate the parameters through the given method
try:
threshold = np.quantile(ams_df, quartile)
threshold = np.quantile(ams_arr, quartile)
param_dist = dist.fit_model(
method=method,
obj_func=obj_func,
Expand Down Expand Up @@ -209,21 +348,28 @@ def ams_analysis(
fig2.savefig(f"{save_to}/figures/f-{i}.png", format="png")
plt.close()

statistical_properties.loc[i, "mean"] = q_ts.mean()
statistical_properties.loc[i, "std"] = q_ts.std()
statistical_properties.loc[i, "min"] = q_ts.min()
statistical_properties.loc[i, "5%"] = q_ts.quantile(0.05)
statistical_properties.loc[i, "25%"] = q_ts.quantile(0.25)
statistical_properties.loc[i, "median"] = q_ts.quantile(0.50)
statistical_properties.loc[i, "75%"] = q_ts.quantile(0.75)
statistical_properties.loc[i, "95%"] = q_ts.quantile(0.95)
statistical_properties.loc[i, "max"] = q_ts.max()
statistical_properties.loc[i, "t_beg"] = q_ts.index.min()
statistical_properties.loc[i, "t_end"] = q_ts.index.max()
if not ams:
quantiles = np.quantile(ams_arr, [0.05, 0.25, 0.50, 0.75, 0.95])
statistical_properties.loc[i, "mean"] = ams_arr.mean()
statistical_properties.loc[i, "std"] = ams_arr.std()
statistical_properties.loc[i, "min"] = ams_arr.min()
statistical_properties.loc[i, "5%"] = quantiles[0]
statistical_properties.loc[i, "25%"] = quantiles[1]
statistical_properties.loc[i, "median"] = quantiles[2]
statistical_properties.loc[i, "75%"] = quantiles[3]
statistical_properties.loc[i, "95%"] = quantiles[4]
statistical_properties.loc[i, "max"] = ams_arr.max()
statistical_properties.loc[i, "start_year"] = ams_df.index.min()
statistical_properties.loc[i, "end_year"] = ams_df.index.max()

if ams:
statistical_properties.loc[i, "nyr"] = (
statistical_properties.loc[i, "end_year"]
- statistical_properties.loc[i, "start_year"]
)
else:
statistical_properties.loc[i, "nyr"] = (
statistical_properties.loc[i, "t_end"]
- statistical_properties.loc[i, "t_beg"]
statistical_properties.loc[i, "end_year"]
- statistical_properties.loc[i, "start_year"]
).days / 365.25

for irp, irp_name in zip(q_rp, rp_name):
Expand Down
1 change: 1 addition & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -617,6 +617,7 @@ def ams_gauges() -> DataFrame:
"""AMS gauges"""
ams = pd.read_csv(f"tests/data/ams-gauges.csv")
ams.index = ams["date"]
ams.drop("date", axis=1, inplace=True)
return ams


Expand Down
Loading

0 comments on commit 00ff8eb

Please sign in to comment.