`pybmpdb`: A convenient interface between `wqio` and the International BMP Database.

Data preparation

Intitial Prep

The data are prepared for the BMP Database in the following ways:

Fetch Site data from the DOT Sites endpoint (/DOTSites), rename columns(see column mapping section)
Fetch BMP Category data from the BMP Code endpoint (/vBMPCodes), rename columns (see column mapping section)
Fetch WQ data from the raw WQ Flat File endpoint (/vWQRaw), rename columns (see column mapping section)
Select only WQ Data where the 'webscreen' column is "y"
Drop any rows with nulls in the "units" column (there should be 0 rows that meet this condition)
Fill nulls in the 'epazone' column with -99, convert column to an integer type
Convert the "storm" (event) column to an integer type
Create a "bmpcode" column that is equal to the "bmptype" column when the "_category" column is 'Manufactured Device', otherwise use the "_catcode" column
Join in the BMPCodes dataset on the "bmpcode" and "category" columns
Drop any rows with nulls in the "ws_id" columns (there should be 0 rows that meet this condition)
Pipe the resulting dataset to `pybmpdb._clean_raw_data, which does the following:
1. Fills nulls in the "qual" column with '='
2. Drops rows with nulls in the "res" column
3. Any leading or trailing whitespace from values in the "qual" column
4. Standardizes values in the "wq_initialscreen" column to either 'y' or 'n'
5. Standardizes values in the "ms_indivscreen" column to either 'y' or 'n'
6. Standardizes values in the "wq_catscreen" column to either 'y' or 'n'
7. Makes values in the "station" column fully lower case (e.g., 'Inflow' becomes 'inflow')
8. Standardizes values in the "sampletype" column to either 'composite', 'grab', or 'unknown'
9. Combines the "sampledate" and "sampletime" columns in a column called "sampledatetime"
10. Normalize all results to the "preferred units" for each parameter
11. Creates and fills a "fraction" column with either 'total' or 'dissolved'
12. Filter out results where the "res" column is less than 0 (there should be 0 rows that meet this condition)
13. Checks that none of the "header" columns have null values (header columns together should uniquely define each observation and are listed below)
14. Group the dataset by the "header" column, compute the mean of the "res" column, minimum of the "qual" column, and minimum of the "sampledatetime" column
15. Confirm that each row has a unique combination of the header columns
Pipe that dataframe to pybmpdb._prepare_for_summary, which does the following:
1. Combines Wetland Basin and Retention Pond data into a serparate dataset where BMP category is shown as "Retention Pond/Wetland Basin" and appended to the main dataaset
2. NO2+NO3 and NO3 datasets are combined (with preference given to NO2+NO3) and assigned a parameter called NOx
3. Grab samples are removed from the dataset except for:
  1. Biological data at all BMP categories
  2. All parameter groups at Retention Pond, Wetland Basin, and Wetland Basin/Retention Pond BMPs
Then with that dataset, go through each event and:
1. Select (prefer) composite samples when both composite and grab samples exist
2. Fall back to subsurface samples if an outflow sample is not available (reclassify as outflow)
3. Fall back to reference samples in an inflow sample is not available (reclassify as inflow)
Save the dataset in this state as the the "flat" (unpaired) dataset
Pivot the dataset such that the values "station" column nest with the "res" and "qual" columns. In effect, we now have:
1. Half as many rows
2. Two columns for "res" and "qual" at the inflow and outflow monitoring stations
Drop all rows where either the "res_inflow" or "res_outflow" column is null (i.e., dropped unpaired observations)
Save this dataset as the "paired" dataset

These two dataset ("flat" and "paired") are merged with the Site data to include the DOT-related information then uploaded to the BMP Database as "WQRecords" and "WQPairs", respectively.

Final Prep

Prior to the analysis for the main WQ and DOT summary report, the following steps are taken:

The data are read in from the /WQRecords endpoint
Non-detect values reported at 0.5 * DL are converted to the full detection limit
A "paramunit" column is created and filled in the format: '{parameter} ({units})'
A "DOT_Activity" column is created from the existing "dot_type" column, replacing 'Not Applicable' values with 'Non-DOT'
Group by "site", "bmp", "paramunit", "station", "Is_DOT" columns and remove groups with fewer than 3 observed events (storms)
Group by "category", "paramunit", "station", "Is_DOT" acolumns nd remove groups with fewer than 3 unique BMP IDs
Select all rows with the specific parameters and BMP catgories to be included in the analysis.

Renaming column mappings

WQ Data Column Map

SiteID → site_id
SiteName → site
City → city
State → state
Country → country
EPARainZone → epazone
DOT_flag → dot_flag
BMPID → bmp_id
BMPName → bmp
BMPCategory_Code → _catcode
BMPCategory_Desc → _category
BMPType → bmptype
BMPType_Desc → bmpdesc
MSID → ms_id
MSName → ms
MSType → station
EventID → storm
EventType → event_type
DateSample → sampledate
TimeSample → sampletime
SampleMedia → watertype
SampleType → sampletype
WQID → wq_id
WSID → ws_id
ParameterName → parameter
Value_SubHalfDL → res
Value_Unit → units
WQQualifier → qual
DetectionLimit → DL
InitialScreen_flag → wq_initialscreen
CategoryAnalysisScreen_flag → _screenflag
UseIndividualAnalysis_Flag → ms_indivscreen
UseCateogoryAnalysis_Flag → _cat
UseInCategoricalAnalysis → wq_catscreen
UseInWebTool → webscreen
DOT_ActivityType_flag → dot_type
ParameterGroupCode → paramgroup

Site Data Column Map

SiteID → site_id
DOT_AADT → aadt
BMPID → bmp_id
WSID → ws_id

BMP Codes Column Map

category_name → category
category_code → bmpcode

Header Columns

WQ Data Header Columns

category
epazone
state
site
bmp
station
storm
sampletype
watertype
paramgroup
units
parameter
fraction
wq_initialscreen
ms_indivscreen
wq_catscreen
bmptype
ws_id
site_id
bmp_id

Name		Name	Last commit message	Last commit date
Latest commit History 451 Commits
conda.recipe		conda.recipe
pybmpdb		pybmpdb
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
check_pybmpdb.py		check_pybmpdb.py
codecov.yml		codecov.yml
generate_vstasks.py		generate_vstasks.py
readme.md		readme.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`pybmpdb`: A convenient interface between `wqio` and the International BMP Database.

Data preparation

Intitial Prep

Final Prep

Renaming column mappings

WQ Data Column Map

Site Data Column Map

BMP Codes Column Map

Header Columns

WQ Data Header Columns

About

Releases 2

Packages

Contributors 2

Languages

License

International-BMP-Database/pybmpdb

Folders and files

Latest commit

History

Repository files navigation

pybmpdb: A convenient interface between wqio and the International BMP Database.

Data preparation

Intitial Prep

Final Prep

Renaming column mappings

WQ Data Column Map

Site Data Column Map

BMP Codes Column Map

Header Columns

WQ Data Header Columns

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

`pybmpdb`: A convenient interface between `wqio` and the International BMP Database.

Packages