-
Notifications
You must be signed in to change notification settings - Fork 17
Documentation
This page details the process of creating and analyzing reduced-order models (ROMs) for the GEMS combustion data. See Problem Statement for an overview of the setting and the data; see Installation and Setup for initial instructions on downloading the code and the data.
In the code examples, $
indicates the command line and >>>
indicates Python.
The code itself is internally documented and can be accessed on the fly with dynamic object introspection, e.g.,
>>> import utils
>>> help(utils.load_gems_data)
-
1. Unpack: extract GEMS data from raw
.tar
files. - 2. Preprocess: prepare a set of training data.
- 3. Train: learn ROMs from training data.
- 4. Plot: simulate trained ROMs and plot results.
- 5. Export: write Tecplot-readable files for full-domain visualization.
- Complete Example
The script step1_unpack.py
reads the GEMS output directly from the .tar
archives downloaded from Globus, gathers the data into a single data set, and saves it in HDF5 format.
The process runs in parallel and takes several minutes.
After the process completes successfully, the .tar
archives from Globus may be deleted.
Usage
python3 step1_unpack.py --help
python3 step1_unpack.py DATAFOLDER [--overwrite] [--serial]
positional arguments:
DATAFOLDER folder containing the raw GEMS .tar data files
optional arguments:
-h, --help show this help message and exit
--overwrite overwrite the existing HDF5 data file
Examples
# Process the raw .tar data files that are placed in /storage/combustion/.
$ python3 step1_unpack.py /storage/combustion
# Process the raw .tar data files that are placed in the current directory, overwriting the resulting HDF5 file if it already exists.
$ python3 step1_unpack.py . --overwrite
# Process the raw .tar data files in /storage/combustion/ serially (not in parallel).
$ python3 step1_unpack.py /storage/combustion --serial
Loading Results: utils.load_gems_data()
.
>>> import utils
>>> gems_data, t = utils.load_gems_data()
Each column of gems_data
is a single snapshot, i.e., gems_data[:,j]
is the full GEMS solution for all 8 native variables at time t[j]
.
The first DOF = 38523
rows of data represent the first variable, and so on.
The variables are, in order,
- Pressure [Pa]
- x-velocity [m/s]
- y-velocity [m/s]
- Temperature [K]
- CH4 (methane) Mass Fraction
- O2 (oxygen) Mass Fraction
- H2O (water) Mass Fraction
- CO2 (carbon dioxide) Mass Fraction
See Problem Statement for more details.
The GEMS snapshot data must be preprocessed to be suitable for Operator Inference.
The script step2_preprocess.py
generates training data for reduced-order model learning in three steps:
- Transform the GEMS variables to the learning variables, then scale each learning variable to the interval [-1,1].
- Compute the POD basis (the dominant left singular vectors) of the lifted, scaled snapshot training data and save the basis and the corresponding singular values.
- Project the lifted, scaled snapshot training data to the low-dimensional subspace defined by the POD basis, compute time derivative information for the projected snapshots, and save the projected data.
These three steps can also performed separately by step2a_transform.py, step2b_basis.py, and step2c_project.py, respectively.
Usage
python3 step2_preprocess.py --help
python3 step2_preprocess.py TRAINSIZE MODES
positional arguments:
TRAINSIZE number of snapshots in the training data
MODES number of POD modes for projecting data
Examples
# Get training data from 10,000 snapshots and with a maximum of 50 POD modes.
$ python3 step2_preprocess.py 10000 50
# Equivalently, do the three steps separately.
$ python3 step2a_transform.py 10000 # Transform (lift and scale) 10,000 GEMS snapshots.
$ python3 step2b_basis.py 10000 50 # Compute a rank-50 POD basis from the transformed snapshots.
$ python3 step2c_project.py 10000 # Project the transformed snapshots and estimate time derivatives.
# Get training data from 15,000 snapshots and with a maximum of 100 POD modes.
$ python3 step2_preprocess.py 15000 100
Loading Results:
-
utils.load_scaled_data()
for the lifted, scaled snapshots and scaling information; -
utils.load_basis()
for the POD basis and associated singular values; -
utils.load_projected_data()
for the projected snapshots and related data.
>>> import utils
>>> trainsize = 10000 # Number of snapshots used as training data.
>>> num_modes = 44 # Number of POD modes.
>>> Q, t, scales = utils.load_scaled_data(trainsize)
>>> V, scales = utils.load_basis(trainsize, num_modes)
>>> Q_, Qdot_, t = utils.load_projected_data(trainsize, num_modes)
Here,
-
Q[:,j]
is a lifted, scaled snapshot corresponding to timet[j]
; -
V[:,j]
is a the _j_th basis vector, the _j_th left singular vector ofQ
; -
Q_[:,j]
is a projected snapshot with approximate time derivativeQdot_[:,j]
, both corresponding to timet[j]
; -
scales[i,:]
is the shifting / dilation factors for learning variable i used in the scaling (seedata_processing.scale()
anddata_processing.unscale()
).
The scales
returned by utils.load_scaled_data()
and utils.load_basis()
are identical, as are the t
returned by utils.load_scaled_data()
and utils.load_projected_data()
.
The script step3_train.py
uses data prepared in step 2 to learn reduced-order models (ROMs) with Tikhonov-regularized Operator Inference with hyperparameter selection.
The regularization is determined by the non-negative scalar hyperparameters λ1 and λ2: λ1 is the penalization for non-quadratic terms in the ROM, and λ2 is the penalization for quadratic terms only (see this paper for more details).
The learned ROM operators are saved in HDF5 format for later use.
This script has three modes for designating or determining an appropriate regularization hyperparameters λ1 and λ2, indicated with the following command line flags.
-
--single
: train and save a ROM for a given choice of λ1 and λ2, passed asREG1
andREG2
. -
--gridsearch
: train one ROM for each (λ1,λ2) pair in the two-dimensionalREG3
xREG6
hyperparameter grid [REG1
,REG2
]x[REG4
,REG5
]; save the stable ROM with the least training error. -
--minimize
: specify initial guesses for λ1 and λ2 asREG1
andREG2
, then use Nelder-Mead search to find a locally optimal hyperparameter pair (λ1,λ2).
Usage
python3 step3_train.py --help
python3 step3_train.py --single TRAINSIZE MODES REG1 REG2
python3 step3_train.py --gridsearch TRAINSIZE MODES REG1 ... REG6 [--testsize TESTSIZE] [--margin MARGIN]
python3 step3_train.py --minimize TRAINSIZE MODES REG1 REG2 [--testsize TESTSIZE] [--margin MARGIN]
subcommands:
--single train and save a single ROM with regularization hyperparameters REG1 (non-quadratic penalizer) and REG2 (quadratic penalizer)
--gridsearch train over the REG3xREG6 grid [REG1,REG2]x[REG4,REG5] of regularization hyperparameter candidates, saving only the stable ROM with the least training error
--minimize given initial guesses REG1 (non-quadratic penalizer) and REG2 (quadratic penalizer), use Nelder-Mead search to train and save a ROM that is locally optimal in the regularization hyperparameter space
positional arguments:
TRAINSIZE number of snapshots in the training data
MODES number of POD modes used to project the data (dimension of ROM to be learned)
REG1 REG2 [...REG6] regularization parameters for ROM training, interpreted differently by --single, --gridsearch, and --minimize
optional arguments:
-h, --help show this help message and exit
--testsize TESTSIZE number of time steps for which the trained ROM must satisfy the POD bound (remain stable)
--margin MARGIN factor by which the POD coefficients of the ROM simulation are allowed to deviate in magnitude from the training data
Examples
## --single: train and save a single ROM for a given λ1, λ2.
# Use 10,000 projected snapshots to learn a ROM of dimension r = 24
# with regularization parameters λ1 = 400, λ2 = 21000.
$ python3 step3_train.py --single 10000 24 400 21000
## --gridsearch: train over a grid of candidates for λ1 and λ2, saving only the stable ROM with least training error.
# Use 20,000 projected snapshots to learn a ROM of dimension r = 40 and save the one with the regularization resulting in the least training error and for which the integrated POD modes stay within 150% of the training data in magnitude for 60,000 time steps. For the regularization parameters, test each point in the 4x5 logarithmically-spaced grid [500,9000]x[8000,10000]
$ python3 step3_train.py --gridsearch 10000 40 5e2 9e3 4 8e3 1e4 5 --testsize 60000 --margin 1.5
## --minimize: given initial guesses for λ1 and λ2, use Nelder-Mead search to train and save a ROM that is locally optimal in the regularization hyperparameter space.
# Use 10,000 projected snapshots to learn a ROM of dimension r = 30 and save the one with the regularization resulting in the least training error and for which the integrated POD modes stay within 150% of the training data in magnitude for 60,000 time steps. For the regularization parameters, search starting from λ1 = 300, λ2 = 7000.
$ python3 step3_train.py --minimize 10000 30 300 7000 --testsize 60000 --margin 1.5
Loading Results: utils.load_rom()
.
>>> import utils
>>> trainsize = 10000 # Number of snapshots used as training data.
>>> num_modes = 44 # Number of POD modes.
>>> regs = (1e4, 1e5) # Regularization hyperparameters for Operator Inference.
>>> rom = utils.load_rom(trainsize, num_modes, regs)
Here rom
is an object of type rom_operator_inference.InferredContinuousROM
.
See the rom_operator_inference
API for documentation.
The script step4_plot.py loads and simulates ROMs trained in step 3, then plots results in time against the corresponding GEMS data. While predictions at a single point are not representative of accuracy as a whole for this problem, these plots are a good first-step for evaluating a ROM.
There are three available plot types, indicated with the following command line flags.
-
--point-traces
: plot learning variables in time at fixed points of the computational domain. See Problem Statement for the default locations. -
--relative-errors
: plot relative projection and prediction errors as a function of time. This routine is memory intensive. -
--spatial-statistics
: spatial averages of pressure, velocities, and temperature, as well as spatial integrals (sums) of species molar concentrations, both as functions of time.
Usage
python3 step4_plot.py --help
python3 step4_plot.py --point-traces TRAINSIZE MODES REG [--location L [...]]
python3 step4_plot.py --relative-errors TRAINSIZE MODES REG
python3 step4_plot.py --spatial-statistics TRAINSIZE MODES REG
subcommands:
--point-traces plot point traces in time at the specified monitoring locations
--relative-errors plot relative errors in time, averaged over the spatial domain
--spatial-statistics plot spatial averages and species integrals
positional arguments:
TRAINSIZE number of snapshots in the training data
MODES number of POD modes used to project the data (dimension of the learned ROM)
REG1 regularization hyperparameter for non-quadratic ROM terms
REG2 regularization hyperparameter for quadratic ROM terms
optional arguments:
-h, --help show this help message and exit
--location L [...] monitor locations for time trace plots
Examples
## --point-traces: plot results in time at fixed spatial locations.
# Plot time traces of each variable at the monitor locations for the ROM trained from 10,000 snapshots with 22 POD modes and regularization hyperparameters λ1 = 300, λ2 = 21000.
$ python3 step4_plot.py --point-traces 10000 22 300 21000
## --spatial-statistics: plot results in time averaged over the spatial domain.
# Plot spatial averages and species integrals for the ROM trained from 20,000 snapshots with 40 POD modes and regularization hyperparameters λ1 = 9e3, λ2 = 1e4.
$ python3 step4_plot.py --spatial-statistics 20000 40 9e3 1e4
## --relative-errors: plot relative projection and prediction errors in time, averaged over the spatial domain.
# Plot errors for the ROM trained from 20,000 snapshots with 43 POD modes and regularization parameters λ1 = 350, λ2 = 18500.
$ python3 step4_plot.py --errors 20000 43 350 18500
Loading Results: figures are saved as PDFs in the folder specified by config.figures_path()
.
>>> import config
>>> print("figures are saved to", config.figures_path())
The script step5_export.py
writes Tecplot-readable ASCII (text) files from simulation data.
The resulting files can be used with Tecplot to visualize snapshots over the computational domain.
There are three types of output files, indicated with the following positional command line arguments:
-
gems
: write full-order GEMS data in the ROM learning variables. -
rom
: write reconstructed ROM outputs. The specific ROM is selected via command line arguments--trainsize
k,--modes
r, and--regularization
λ1 and λ2. -
error
: write the absolute error between the GEMS data and the ROM outputs.
Usage
python3 step5_export.py -h
python3 step5_export.py (gems | rom | error) --timeindex T [...] --variables V [...] [--trainsize TRAINSIZE] [--modes MODES] [--regularization REG1 REG2]
positional arguments:
SNAPTYPE which snapshot types to save (gems, rom, error)
optional arguments:
-h, --help show this help message and exit
--timeindex T [...] indices of snapshots to save (default every 100th snapshot)
--variables V [...] variables to save, a subset of config.ROM_VARIABLES (default all)
--trainsize TRAINSIZE number of snapshots in the ROM training data
--modes MODES ROM dimension (number of retained POD modes)
--regularization REG1 REG2
regularization hyperparameters in the ROM training
Examples
# Export every 100th snapshot (default) of GEMS data (all variables).
$ python3 step5_export.py gems
# Export only snapshot 5000 of GEMS data (all variables).
$ python3 step5_export.py gems --timeindex 5000
# Export only snapshot 4000 of GEMS pressure and temperature data.
$ python3 step5_export.py gems --timeindex 4000 --variables p T
# Export snapshot 4000 of reconstructed pressure, temperature, and methane data from the ROM trained from 10,000 snapshots, 22 POD modes, and regularization hyperparameters 200 and 30000.
$ python3 step5_export.py rom --timeindex 4000 --variables p T CH4 --trainsize 10000 --modes 22 --regularization 2e2 3e4
# Export every 100th snapshot of reconstructed ROM data (all variables) and the absolute errors, derived from the ROM trained from 20,000 snapshots, 44 POD modes, and regularization hyperparameter 100 and 40000.
$ python3 step5_export.py rom error --trainsize 20000 --modes 44 --regularization 1e2 4e4
Loading Results: data files are saved in the folder specified by config.tecplot_path()
.
>>> import config
>>> print("Tecplot-friendly files are exported to", config.tecplot_path())
The files can be visualized with Tecplot (File >> Load Data
, then check the Contours
box).
For this walkthrough, we assume the (small) code files exist in a folder ~/Desktop/combustion
, the (large) data files exist in a folder /storage/combustion
, and the BASE_FOLDER
variable in [config.py
]((../blob/master/config.py) is set to /storage/combustion
.
Suppose we want to create a ROM from 20,000 snapshots with 43 POD modes and create some visualizations to analyze its performance. We don't have appropriate value for the regularization hyperparameters λ1 and λ2 as of yet.
# Navigate to the code directory.
$ cd ~/Desktop/combustion
# Unpack the raw data in the data directory.
$ python3 step1.py /storage/combustion
# Prepare a set of training data with 20,000 snapshots and 50 POD modes.
$ python3 step2_preprocess.py 20000 50 # this suffices as 50 > 43.
# Do a gridsearch over [100,500]x[15000,25000] with 10 logarithmically spaced values for λ1 and 15 logarithmically spaced values for λ2
$ python3 step3_train.py --gridsearch 20000 44 1e2 5e5 10 1.5e4 2.5e4 15
The grid search selects λ1 = 245 and λ2 = 19365, so we do a more targeted hyperparameter search in that vicinity.
# Train a ROM with locally optimal regularization hyperparameters near λ1=245, λ2=19365.
$ python3 step3_train.py --minimize 20000 44 245 19365
The minimization selects λ1 = 322, λ2 = 18199. Now we plot point-wise results and export data for visualization with Tecplot.
# Plot learning variable point traces and spatial statistics against the corresponding GEMS data.
$ python3 step4_plot.py --point-traces 20000 44 322 18199
$ python3 step4_plot.py --spatial-statistics 20000 44 322 18199
# Export every 100th snapshot to Tecplot for visualization.
$ python3 step5_export.py gems rom --trainsize 20000 --modes 44 --regularization 322 18199
The figures will be in ~/Desktop/combustion/figures/
and the Tecplot-friendly files will be in /storage/combustion/tecdata/
.
Problem Statement: computational domain, state variables, and description of the data.
Installation and Setup: how to download the source code and the data files.
File Summary: short descriptions of each file in the repository.
Documentation: how to use the repository for reduced-order model learning.
Results: plots and figures, including many additional results that are not in the publications.
References: short list of primary references.