diff --git a/docs/megabeast/core_api.rst b/docs/megabeast/core_api.rst index 46bf7c3..7e716da 100644 --- a/docs/megabeast/core_api.rst +++ b/docs/megabeast/core_api.rst @@ -2,8 +2,6 @@ Core **** -.. automodapi:: megabeast.singlepop_dust_model - -.. automodapi:: megabeast.mbsettings +.. automodapi:: megabeast.mbmodel .. automodapi:: megabeast.helpers diff --git a/docs/megabeast/images/metal_small_sim_f475w_nsim_plot.png b/docs/megabeast/images/metal_small_sim_f475w_nsim_plot.png new file mode 100644 index 0000000..af78ffa Binary files /dev/null and b/docs/megabeast/images/metal_small_sim_f475w_nsim_plot.png differ diff --git a/docs/megabeast/images/metal_small_sim_f475w_prior_plot.png b/docs/megabeast/images/metal_small_sim_f475w_prior_plot.png new file mode 100644 index 0000000..212ec59 Binary files /dev/null and b/docs/megabeast/images/metal_small_sim_f475w_prior_plot.png differ diff --git a/docs/megabeast/simulate.rst b/docs/megabeast/simulate.rst index 32e4801..a21c747 100644 --- a/docs/megabeast/simulate.rst +++ b/docs/megabeast/simulate.rst @@ -1,11 +1,182 @@ -############################# -Simulations for the MegaBEAST -############################# +########### +Simulations +########### -To test that the MegaBEAST is behaving as expected, it is important to -simulate data and evaluate if the results match the input. +Simulations of observations can be made based on already computed physics and +observation model grids. Uses for such simulations include testing the +sensitivity of a specific set of observations to BEAST model parameters and +testing ensemble fitters like the MegaBEAST. -Simulations should be done with using the BEAST tool to simulate the -fluxes for an ensemble of stars using a BEAST physics+observation model. -Then the BEAST should be used to fit the simulated fluxes and and the -resulting BEAST outputs (especially lnp files) used for testing the MegaBEAST. +Simulations are done using the +`megabeast.simujlate_obs.gen_SimObs_from_sedgrid` function. The +script `tools/simulate_obs.py` provides a command line interface and can be run +using the `mb_simulate_obs` command once the megabeast has been installed. +Simulations require already created BEAST physics and observation model grids. + +The physics model grid includes the ensemble parameters as these are the same as +the BEAST (see `beast priors `_). +The module uses already created BEAST physics and +observation model grids by sampling the full nD prior function that is part of +the physics model grid. The observation model grid provides the information on +the flux uncertainties and biases as well as the completeness. + +The ensemble parameters can be changed from those in the BEAST physics model grid +using ...to be added... + +The number of observations to simulate can be explicitly specified by using the `nsim` +parameter or calculated from the age and mass prior/ensemble models. +To do the latter, pass the `beastinfo` ASDF file to the `simulate_obs` script. +This is done by +computing the mass formed for each unique age in the physics grid (combination +of age and mass priors) accounting for stars that have already disappeared and +dividing by the average mass (mass prior model). This number of stars is +simulated at each age and then the completeness function is applied to remove +all the simulated stars that would not be observed. + +The `mb_simulate_obs` also can be used to just simulate extra ASTs. This +simulation is replacing generating additional input artificial stars and running +those through a photomtery pipeline. Thus, it is very powerful if you need more +ASTs for generating MATCH-satisfied ASTs in addition to what you already have +from the real ASTs. This means that you can avoid extensive/additional real ASTs +as long as your initial/existing real ASTs properly cover the entire CMD space, +i.e., a combination of BEAST-optimized ASTs supplemented by MATCH-friendly ASTs +should be enough. This simulation provides the information on the flux uncertainties, +biases, and the completeness for the additionally selected model SEDs. + +********* +Toothpick +********* + +The files for the physicsgrid and obsgrid files are required inputs to +the script. The output filename is also required. Note that the extension +of this file will determine the type of file output (e.g. filebase.fits for +a FITS file). +The filter to use for the completeness function is given by the +`--compl_filter` parameter (default=F475W). +Set `compl_filter=max` to use the max completeness value across all the filters. +The SEDs are picked weighted by the product of the grid+prior weights +and the completeness from the noisemodel. The grid+prior weights can be replaced +with either grid, prior, or uniform weights by explicitly setting the `--weight_to_use` +parameter. With the uniform weight, the SEDs are randomly selected from the given +physicsgrid, and this option is only valid when `nsim` is explicitly set and useful +for simulating ASTs. + +.. code-block:: console + + $ mb_simulate_obs -p physicsgrid -n obsgrid -o outfile --nsim 200 --compl_filter=F475W + +There are two optional keyword augments specific to the AST simulations: `--complcut` +and `--magcut`. `--complcut` allows you to exclude model SEDs below the supplied +completeness cut in the `--compl_filter`. Note that the brightest models will be exlcuded +as well because of the rapid drop in completeness at the bright end. `--magcut` allows +you to exclude model SEDs fainter than the supplied magnitude in the `--compl_filter`. + +.. code-block:: console + + $ mb_simulate_obs -p physicsgrid -n obsgrid -o outfile --nsim 100000 --compl_filter=F475W \ + --magcut 30 --weight_to_use uniform + +The output file gives the simulated data in the observed data columns +identified in the physicsgrid file along with all the model parameters +from the physicsgrid file. The simulated observations in each band are given +as `band_flux` in physical units (ergs cm^-2 s^-1 A^-1), +`band_rate` as normalized Vega fluxes (`band_flux`/vega_flux to match how +the observed data are given), and `band_vega` as vega magnitudes with zero and +negative fluxes given as -99.999. +The physicsgrid values without noise/bias are given as `band_input_flux`, +`band_input_rate`, and `band_input_vega`. + +Examples +-------- + +Example for the `metal_small` example for a simulation based on the prior model. +Plot created with `beast plot_cmd`. Prior model has a flat star formation history +with a SFR=1e-5 M_sun/year and a Kroupa IMF. + +.. code-block:: console + + $ mb_simulate_obs -p beast_metal_small_seds.grid.hd5 \ + -n beast_metal_small_noisemodel.grid.hd5 \ + -o sim_475w_prior.fits \ + --beastinfo_list beast_metal_small_beast_info.asdf + +.. image:: images/metal_small_sim_f475w_prior_plot.png + +Example for the `metal_small` example for a simulation based on input number of +stars. Plot created with `beast plot_cmd`. Note that not all 1000 simulated +sources are shown as the simulated sources include those with negative fluxes +that cannot be plotted on a standard CMD. + +.. code-block:: console + + $ mb_simulate_obs -p beast_metal_small_seds.grid.hd5 \ + -n beast_metal_small_noisemodel.grid.hd5 \ + -o sim_475w_nsim.fits \ + --nsim=1000 + +.. image:: images/metal_small_sim_f475w_nsim_plot.png + +High-mass star biased simulations +--------------------------------- + +When creating simulated observations, using the standard IMF mass prior will +skew your catalog to lower-mass stars. If you wish to have similar weights for +stars of all masses, use a flat IMF and a log-flat age prior. To do this, +set the mass prior to `{'name': 'flat'}` and the age prior to +`{'name': 'flat_log'}` in `beast_settings.txt` before creating the model grid. + +********* +Truncheon +********* + +The code does not handle the truncheon model at this point. While this model +is doable in the BEAST, it has not been done yet due to several potentially +complex modeling questions for actually using it that might impact how the model +is implemented. + +******** +Plotting +******** + +To plot a color-magnitude diagram of the simulated observations, a +sample call from the command line may be: + +.. code-block:: console + + $ beast plot_cmd outfile.fits --mag1 F475W --mag2 F814W --mag3 F475W + +where `outfile.fits` may be the output from `simulate_obs`. +`mag1`-`mag2` is the color, and `mag3` the magnitude. If you would like to save +(rather than simply display) the figure, include ``--savefig png`` (or another +preferred file extension), and the figure will be saved as `outfile_plot.png` in +the directory of `outfile.fits`. + +************** +Remove Filters +************** + +.. note:: + This needs to be updated. remove_filters.py in the beast repository. + Maybe move to megabeast or incorporate in to the `beast script` paradigm. + +One use case for simulations is to test the impact of specific filters +on the BEAST results. One solution is to create multiple physics/observation +model grids, create simulations from each set of grids, and then fit the +simulations with the BEAST. A quicker way to do this is to create the +physics/observation grid set with the full set of desired filters, create +the desired simulations, remove filters from the model and simulations as +needed, and then fit with the BEAST. This has the benefit of the simulations +with different filter sets are exactly the same except for the removed filters. + +As an example, to remove the filters F275W and F336W from the simulated +observations contained in 'catfile.fits' and the 'physgrid.hd5'/'obsgrid.hd5' +set of models use the following command. + +.. code-block:: console + + $ python remove_filters.py catfile.fits --physgrid physgrid.hd5 \ + --obsgrid obsgrid.hd5 --outbase outbase --rm_filters F275W F336W + +New physics/observation model grids and simulated observation files are +created as 'outbase_seds.grid.hd5', 'outbase_noisemodel.grid.hd5', and +'outbase_cat.fits'. diff --git a/megabeast/fit_single.py b/megabeast/fit_single.py index 80b6c04..5e1dacd 100644 --- a/megabeast/fit_single.py +++ b/megabeast/fit_single.py @@ -4,9 +4,8 @@ from beast.physicsmodel.grid import SEDGrid -from megabeast.mbsettings import mbsettings -from megabeast.helpers import get_likelihoods -from megabeast.singlepop_dust_model import MB_Model, fit_ensemble, lnprob +from megabeast.helpers import get_likelihoods, read_mbmodel +from megabeast.mbmodel import MBModel, fit_ensemble, lnprob def main(): @@ -17,12 +16,9 @@ def main(): ) args = parser.parse_args() - # read the parameters into a class - mbparams = mbsettings(args.settings_file) - # should validate the inputs - make sure the required info is provided - # define the model to fit - megabeast_model = MB_Model(mbparams) + mbparams = read_mbmodel(args.settings_file) + megabeast_model = MBModel(mbparams.stellar_model, mbparams.dust_model) # print(megabeast_model.physics_model) # BEAST files used by MegaBEAST diff --git a/megabeast/helpers.py b/megabeast/helpers.py index ff49a2d..0aedf41 100644 --- a/megabeast/helpers.py +++ b/megabeast/helpers.py @@ -1,10 +1,14 @@ import numpy as np from numpy.random import default_rng +import h5py +from beast.physicsmodel.grid import SEDGrid from beast.tools.read_beast_data import read_lnp_data from beast.physicsmodel.grid_weights_stars import compute_bin_boundaries __all__ = [ + "read_mbmodel", + "read_beast_moddata", "get_likelihoods", "precompute_mass_multipliers", "get_predicted_num_stars", @@ -12,6 +16,81 @@ ] +def read_mbmodel(settings_file): + """ + Read in the settings file and set parameters + + Parameters + ---------- + settings_file : string + filename that has the stellar_model and dust_model dictionaries + """ + + # read everything in as strings + with open(settings_file, "r") as f: + temp_data = f.readlines() + # remove empty lines and comments + input_data = [ + line.strip() + for line in temp_data + if line.strip() != "" and line.strip()[0] != "#" + ] + # remove comments that are mid-line (e.g., "x = 5 #comment") + for i, line in enumerate(input_data): + try: + input_data[i] = line[: line.index("#")] + except ValueError: + pass + # if parameters are defined over multiple lines, combine lines + for i in reversed(range(len(input_data))): + if ("import " not in input_data[i]) and ("=" not in input_data[i]): + input_data[i - 1] += input_data[i] + del input_data[i] + + # parse it into a dictionary + params = {} + + for i in range(len(input_data)): + # execute imports + if "import " in input_data[i]: + exec(input_data[i]) + + # extract parameter and value (as strings) + else: + param = input_data[i].split("=")[0].strip() + # exec the string to get parameter values accessible + exec(input_data[i]) + params[param] = eval(param) + + return params + + +def read_beast_moddata(physmodfile, obsmodfile, params): + """ + Read in the BEAST model data. Only read the physics and observation + information needed for the fitting. + """ + + # get the BEAST physics model info needed + # using SEDGrid as it is faster than beast.tools.read_beast_data.read_sed_data + # only read in the columns specifically needed + beast_moddata = {} + beast_physmod_param_list = params + ["prior_weight", "grid_weight"] + + sgrid = SEDGrid(physmodfile, backend="disk") + for cparam in beast_physmod_param_list: + beast_moddata[cparam] = sgrid.grid[cparam] + + # get the completeness from the BEAST observation model + # use the maximum completeness across the bands as the correct obsmodel + # would only have one completeness value per model + # max is the best approximation for the toothpick model (maybe??? average??) + with h5py.File(obsmodfile, "r") as obs_hdf: + beast_moddata["completeness"] = np.max(obs_hdf["completeness"], axis=1) + + return beast_moddata + + def get_likelihoods(ppdf_file, beast_model_data): """ Read in the saved BEAST sparse posterior PDFs and divide by the BEAST @@ -30,7 +109,7 @@ def get_likelihoods(ppdf_file, beast_model_data): # BEAST saves posterior PDFs labeled as log(pPDF) lnpdata = read_lnp_data(ppdf_file) - # divide by the BEAST prior weights to return to recover the likelihoods + # divide by the BEAST prior weights to recover the likelihoods n_lnps, n_stars = lnpdata["indxs"].shape for i in range(n_stars): indxs = lnpdata["indxs"][:, i] @@ -102,9 +181,7 @@ def precompute_mass_multipliers(bphysparams, physmodmass): } -def get_predicted_num_stars( - massmult_info, bphysparams, bphysmod, physmodage -): +def get_predicted_num_stars(massmult_info, bphysparams, bphysmod, physmodage): """ Calculate the expected number of stars based on the physics model as given on the BEAST model grid including completeness. @@ -164,9 +241,7 @@ def get_predicted_num_stars( return n_totstars -def get_predicted_num_stars_simulate( - massmult_info, bphysparams, bphysmod, physmodage -): +def get_predicted_num_stars_simulate(massmult_info, bphysparams, bphysmod, physmodage): """ Calculate the expected number of stars based on the physics model as given on the BEAST model grid including completeness. diff --git a/megabeast/singlepop_dust_model.py b/megabeast/mbmodel.py similarity index 85% rename from megabeast/singlepop_dust_model.py rename to megabeast/mbmodel.py index c787cf8..a0a8978 100644 --- a/megabeast/singlepop_dust_model.py +++ b/megabeast/mbmodel.py @@ -5,32 +5,28 @@ from beast.physicsmodel.priormodel import PriorAgeModel as PhysAgeModel from beast.physicsmodel.priormodel import PriorMassModel as PhysMassModel +from beast.physicsmodel.priormodel import PriorMetallicityModel as PhysMetallicityModel +from beast.physicsmodel.priormodel import PriorDistanceModel as PhysDistanceModel from beast.physicsmodel.priormodel import PriorDustModel as PhysDustModel from megabeast.helpers import precompute_mass_multipliers, get_predicted_num_stars -__all__ = ["MB_Model", "fit_ensemble"] +__all__ = ["MBModel", "fit_ensemble"] -class MB_Model: +class MBModel: """ MegaBEAST model that provides member functions to compute the likelihood and priors for a specific physical model """ - def __init__(self, params): - self.star_model = params.stellar_model - self.dust_model = params.fd_model + def __init__(self, stellar_model, dust_model): + self.star_model = stellar_model + self.dust_model = dust_model - # setup the physics model for the beast parameters - # uses the same format as the beast priors = megabeast physics model - # --> needs to be generalized to also handle stellar parameters - # define a dict that translates between mb params and physical models - self.params = ["logA", "M_ini", "Av", "Rv", "fA"] + # setup the megabeast physics model for the beast parameters + self.params = ["logA", "M_ini", "Z", "distance", "Av", "Rv", "f_A"] self.physics_model = {} - print(self.params) - print(self.star_model.keys()) - print(self.dust_model.keys()) for cparam in self.params: if cparam in self.star_model.keys(): cmod = self.star_model[cparam] @@ -40,15 +36,22 @@ def __init__(self, params): raise ValueError("requested parameter not in mbsetting file") self.physics_model[cparam] = {"name": cmod["name"]} - self.physics_model[cparam]["varnames"] = cmod["varnames"] - self.physics_model[cparam]["prior"] = cmod["prior"] - for cname, cval in zip(cmod["varnames"], cmod["varinit"]): - self.physics_model[cparam][cname] = cval + if "varnames" in cmod.keys(): + self.physics_model[cparam]["varnames"] = cmod["varnames"] + else: + self.physics_model[cparam]["varnames"] = None + if "prior" in cmod.keys(): + self.physics_model[cparam]["prior"] = cmod["prior"] + else: + self.physics_model[cparam]["prior"] = None + if self.physics_model[cparam]["varnames"] is not None: + for cname, cval in zip(cmod["varnames"], cmod["varinit"]): + self.physics_model[cparam][cname] = cval # setup the physics model for this parameter if cparam in self.star_model.keys(): if cparam == "logA": - self.physics_model[cparam]["x"] = self.star_model["x"] + self.physics_model[cparam]["x"] = self.star_model[cparam]["x"] self.physics_model[cparam]["nsubvars"] = len( self.physics_model[cparam]["x"] ) @@ -60,13 +63,25 @@ def __init__(self, params): self.physics_model[cparam]["model"] = PhysMassModel( self.physics_model[cparam] ) + elif cparam == "Z": + self.physics_model[cparam]["model"] = PhysMetallicityModel( + self.physics_model[cparam] + ) + elif cparam == "distance": + self.physics_model[cparam]["model"] = PhysDistanceModel( + self.physics_model[cparam] + ) + else: + raise NotImplementedError( + f"{cparam} is not an allowed stellar model parameter" + ) elif cparam in self.dust_model.keys(): self.physics_model[cparam]["nsubvars"] = 1 self.physics_model[cparam]["model"] = PhysDustModel( self.physics_model[cparam] ) - # variable to control if N stars detected is computed + # variable to control if N stars is computed # not done of SFH is fixed if self.physics_model["logA"]["prior"]["name"] == "fixed": self.compute_N_stars = False @@ -75,7 +90,7 @@ def __init__(self, params): # variable to allow for the computation of the mass mulitplier for each # age, mass, met to be done only once if the IMF is fixed - # must be True so during the 1st call to lnlike it is computed for all cases + # must be True so during the 1st call to lnlike, it is computed for all cases self.compute_massmult = True self.massmultipliers = None @@ -313,7 +328,6 @@ def _get_best_fit_params(sampler): nwalkers, nsteps = sampler.lnprobability.shape for k in range(nwalkers): tmax_lnp = np.nanmax(sampler.lnprobability[k]) - print(tmax_lnp) if tmax_lnp > max_lnp: max_lnp = tmax_lnp (indxs,) = np.where(sampler.lnprobability[k] == tmax_lnp) @@ -322,7 +336,7 @@ def _get_best_fit_params(sampler): return fit_params_best -def fit_ensemble(megabeast_model, star_lnpgriddata, beast_moddata): +def fit_ensemble(megabeast_model, star_lnpgriddata, beast_moddata, nsteps=300): """ Run the MegaBEAST on a single set of BEAST results. @@ -350,15 +364,15 @@ def chi2(*args): sparams = megabeast_model.start_params()[1] - # result = op.minimize( - # result = op.least_squares( - # chi2, - # sparams, - # args=(megabeast_model, star_lnpgriddata, beast_moddata), - # ftol=1e-20, - # xtol=1e-20 - # method="Nelder-Mead", - # ) + result = scipy.optimize.minimize( + chi2, + sparams, + args=(megabeast_model, star_lnpgriddata, beast_moddata), + #ftol=1e-20, + #xtol=1e-20, + method="Nelder-Mead", + ) + return result["x"] # exit() ndim, nwalkers = len(sparams), 5 * len(sparams) @@ -368,7 +382,6 @@ def chi2(*args): sampler = emcee.EnsembleSampler( nwalkers, ndim, lnprob, args=(megabeast_model, star_lnpgriddata, beast_moddata) ) - nsteps = 300 sampler.run_mcmc(pos, nsteps, progress=True) # samples = sampler.get_chain() diff --git a/megabeast/mbsettings.py b/megabeast/mbsettings.py deleted file mode 100644 index 2203da8..0000000 --- a/megabeast/mbsettings.py +++ /dev/null @@ -1,77 +0,0 @@ -__all__ = ["mbsettings"] - - -class mbsettings: - """ - All of the settings for the MegaBEAST - - Attributes - ---------- - settings_file : string - name of the parameter file that contains the settings in python format - (e.g., supports dictonaries) - """ - - def __init__(self, settings_file=None): - """ - Parameters - ---------- - settings_file : string - input file name - """ - if settings_file is not None: - self.settings_file = settings_file - self.read() - self.verify() - - def read(self): - """ - Read in the settings file and set parameters - """ - - # read everything in as strings - with open(self.settings_file, "r") as f: - temp_data = f.readlines() - # remove empty lines and comments - input_data = [ - line.strip() - for line in temp_data - if line.strip() != "" and line.strip()[0] != "#" - ] - # remove comments that are mid-line (e.g., "x = 5 #comment") - for i, line in enumerate(input_data): - try: - input_data[i] = line[: line.index("#")] - except ValueError: - pass - # if parameters are defined over multiple lines, combine lines - for i in reversed(range(len(input_data))): - if ("import " not in input_data[i]) and ("=" not in input_data[i]): - input_data[i - 1] += input_data[i] - del input_data[i] - - # parse it into a dictionary - beast_params = {} - - for i in range(len(input_data)): - # execute imports - if "import " in input_data[i]: - exec(input_data[i]) - - # extract parameter and value (as strings) - else: - param = input_data[i].split("=")[0].strip() - # exec the string to get parameter values accessible - exec(input_data[i]) - beast_params[param] = eval(param) - - # turn dictionary into attributes - for key in beast_params: - setattr(self, key, beast_params[key]) - - def verify(self): - """ - Run the verification code on the settings - """ - pass - # verify_beast_settings.verify_input_format(self) diff --git a/megabeast/tests/basic_test.py b/megabeast/tests/basic_test.py deleted file mode 100644 index c5a4689..0000000 --- a/megabeast/tests/basic_test.py +++ /dev/null @@ -1,7 +0,0 @@ -from megabeast.mbsettings import mbsettings - - -def test_basic(): - mbparams = mbsettings() - - assert isinstance(mbparams, mbsettings) diff --git a/megabeast/tests/test_mbmodel_init.py b/megabeast/tests/test_mbmodel_init.py new file mode 100644 index 0000000..02ad6f3 --- /dev/null +++ b/megabeast/tests/test_mbmodel_init.py @@ -0,0 +1,91 @@ +import astropy.units as u +from megabeast.mbmodel import MBModel + + +def test_mbmodel_init(): + + # stellar population model + stellar_model = { + "logA": { # star formation history SFH + "name": "bins_histo", + "x": [6.0, 7.0, 8.0, 9.0, 10.0], # units are log(years) + "varnames": ["values"], + "varinit": [[1e-8, 1e-8, 1e-8, 1e-8, 1e-8]], # units are M_sun/year + "prior": { + "name": "flat", + "minmax": [[0.0, 0.0, 0.0, 0.0, 0.0], [1e-3, 1e-3, 1e-3, 1e-3, 1e-3]], + }, + }, + "M_ini": { # initial mass function + "name": "salpeter", + "varnames": ["slope"], + "varinit": [2.35], + "prior": { + "name": "flat", + "minmax": [[2.0, 3.0]], + }, + }, + "Z": { + "name": "flat", + }, + "distance": { + "name": "absexponential", + "varnames": ["dist0", "tau", "amp"], + "varinit": [60.0 * u.kpc, 5.0 * u.kpc, 1.0], + "prior": { + "name": "flat", + "minmax": [[50.0, 3.0, 0.9], [70.0, 7.0, 1.1]], + }, + }, + } + + # foreground dust cloud + dust_model = { + "Av": { + "name": "lognormal", + "varnames": ["mean", "sigma"], + "varinit": [1.0, 0.1], + "prior": { + "name": "flat", + "minmax": [[0.005, 5.0], [0.05, 1.0]], + }, + }, + "Rv": { + "name": "lognormal", + "varnames": ["mean", "sigma"], + "varinit": [3.1, 0.25], + "prior": { + # "name": "fixed", + "name": "flat", + "minmax": [[2.0, 6.0], [0.05, 1.0]], + }, + }, + "f_A": { + "name": "lognormal", + "varnames": ["mean", "sigma"], + "varinit": [1.0, 0.25], + "prior": { + "name": "flat", + "minmax": [[0.0, 1.0], [0.05, 0.5]], + }, + }, + } + + mod = MBModel(stellar_model, dust_model) + + assert isinstance(mod.star_model, dict) + assert isinstance(mod.dust_model, dict) + + assert isinstance(mod.physics_model, dict) + + assert isinstance(mod.compute_N_stars, bool) + assert isinstance(mod.compute_massmult, bool) + + # parameters expected + expparam = ["logA", "M_ini", "Z", "distance", "Av", "Rv", "f_A"] + for cparam in mod.physics_model.keys(): + assert cparam in expparam + for csubparam in ["name", "model"]: + assert ( + csubparam in mod.physics_model[cparam].keys() + ), f"required {csubparam} not found in {cparam}" diff --git a/megabeast/tools/simulate_obs.py b/megabeast/tools/simulate_obs.py new file mode 100644 index 0000000..a3edcc6 --- /dev/null +++ b/megabeast/tools/simulate_obs.py @@ -0,0 +1,543 @@ +import numpy as np +import argparse +import asdf + +from numpy.random import default_rng + +from astropy.table import Table, Column + +from beast.observationmodel.vega import Vega +from beast.physicsmodel.grid import SEDGrid +from beast.physicsmodel.priormodel import PriorAgeModel, PriorMassModel +from beast.physicsmodel.grid_weights_stars import compute_bin_boundaries +from beast.physicsmodel.grid_and_prior_weights import ( + compute_distance_age_mass_metallicity_weights, + compute_av_rv_fA_prior_weights, +) +import beast.observationmodel.noisemodel.generic_noisemodel as noisemodel + +from megabeast.helpers import read_mbmodel +from megabeast.mbmodel import MBModel + +from astropy.table import vstack + +__all__ = ["gen_SimObs_from_sedgrid", "simulate_obs"] + + +def gen_SimObs_from_sedgrid( + sedgrid, + sedgrid_noisemodel, + nsim=0, + compl_filter="max", + complcut=None, + magcut=None, + ranseed=None, + vega_fname=None, + weight_to_use="weight", + age_prior_model=None, + mass_prior_model=None, +): + """ + Generate simulated observations using the physics and observation grids. + The priors are sampled as they give the ensemble model for the stellar + and dust distributions (IMF, Av distribution etc.). + The physics model gives the SEDs based on the priors. + The observation model gives the noise, bias, and completeness all of + which are used in simulating the observations. + + Currently written to only work for the toothpick noisemodel. + + Parameters + ---------- + sedgrid: grid.SEDgrid instance + model grid + + sedgrid_noisemodel: beast noisemodel instance + noise model data + + nsim : int + number of observations to simulate + + compl_filter : str + Filter to use for completeness (required for toothpick model). + Set to max to use the max value in all filters. + + complcut : float (defualt=None) + completeness cut for only including model seds above the cut + where the completeness cut ranges between 0 and 1. + + magcut : float (defualt=None) + faint-end magnitude cut for only including model seds brighter + than the given magnitude in compl_filter. + + ranseed : int + used to set the seed to make the results reproducable, + useful for testing + + vega_fname : string + filename for the vega info, useful for testing + + weight_to_use : string (default='weight') + Set to either 'weight' (prior+grid), 'prior_weight', 'grid_weight', + or 'uniform' (this option is valid only when nsim is supplied) to + choose the weighting for SED selection. + + age_prior_model : dict + age prior model in the BEAST dictonary format + + mass_prior_model : dict + mass prior model in the BEAST dictonary format + + Returns + ------- + simtable : astropy Table + table giving the simulated observed fluxes as well as the + physics model parmaeters + """ + n_models, n_filters = sedgrid.seds.shape + flux = sedgrid.seds + + # get the vega fluxes for the filters + _, vega_flux, _ = Vega(source=vega_fname).getFlux(sedgrid.filters) + + # cache the noisemodel values + model_bias = sedgrid_noisemodel["bias"] + model_unc = np.fabs(sedgrid_noisemodel["error"]) + model_compl = sedgrid_noisemodel["completeness"] + + # only use models that have non-zero completeness in all filters + # zero completeness means the observation model is not defined for that filters/flux + # if complcut is provided, only use models above that completeness cut + if complcut is not None: + finalcomplcut = complcut + else: + finalcomplcut = 0.0 + + ast_defined = model_compl > finalcomplcut + sum_ast_defined = np.sum(ast_defined, axis=1) + goodobsmod = sum_ast_defined >= n_filters + + # completeness from toothpick model so n band completeness values + # require only 1 completeness value for each model + # max picked to best "simulate" how the photometry detection is done + if compl_filter.lower() == "max": + model_compl = np.max(model_compl, axis=1) + else: + short_filters = [ + filter.split(sep="_")[-1].upper() for filter in sedgrid.filters + ] + if compl_filter.upper() not in short_filters: + raise NotImplementedError( + "Requested completeness filter not present:" + + compl_filter.upper() + + "\nPossible filters:" + + "\n".join(short_filters) + ) + + filter_k = short_filters.index(compl_filter.upper()) + print("Completeness from %s" % sedgrid.filters[filter_k]) + model_compl = model_compl[:, filter_k] + + # if magcut is provided, only use models brighter than the magnitude cut + # in addition to the non-zero completeness criterion + if magcut is not None: + fluxcut_compl_filter = 10 ** (-0.4 * magcut) * vega_flux[filter_k] + goodobsmod = (goodobsmod) & (flux[:, filter_k] >= fluxcut_compl_filter) + + # initialize the random number generator + rangen = default_rng(ranseed) + + # if the age and mass prior models are given, use them to determine the + # total number of stars to simulate + model_indx = np.arange(n_models) + if (age_prior_model is not None) and (mass_prior_model is not None): + print("determining the number of simulated stars from age and mass priors") + nsim = 0 + # logage_range = [min(sedgrid["logA"]), max(sedgrid["logA"])] + mass_range = [min(sedgrid["M_ini"]), max(sedgrid["M_ini"])] + + # compute the total mass and average mass of a star given the mass_prior_model + nmass = 100 + masspts = np.logspace(np.log10(mass_range[0]), np.log10(mass_range[1]), nmass) + mass_prior = PriorMassModel(mass_prior_model) + massprior = mass_prior(masspts) + totmass = np.trapz(massprior, masspts) + avemass = np.trapz(masspts * massprior, masspts) / totmass + + # compute the mass of the remaining stars at each age and + # simulate the stars assuming everything is complete + gridweights = sedgrid[weight_to_use] + gridweights = gridweights / np.sum(gridweights) + + grid_ages = np.unique(sedgrid["logA"]) + age_prior = PriorAgeModel(age_prior_model) + ageprior = age_prior(grid_ages) + bin_boundaries = compute_bin_boundaries(grid_ages) + bin_widths = np.diff(10 ** (bin_boundaries)) + totsim_indx = np.array([], dtype=int) + for cage, cwidth, cprior in zip(grid_ages, bin_widths, ageprior): + gmods = sedgrid["logA"] == cage + cur_mass_range = [ + min(sedgrid["M_ini"][gmods]), + max(sedgrid["M_ini"][gmods]), + ] + gmass = (masspts >= cur_mass_range[0]) & (masspts <= cur_mass_range[1]) + curmasspts = masspts[gmass] + curmassprior = massprior[gmass] + totcurmass = np.trapz(curmassprior, curmasspts) + + # compute the mass remaining at each age -> this is the mass to simulate + simmass = cprior * cwidth * totcurmass / totmass + nsim_curage = int(round(simmass / avemass)) + + # simluate the stars at the current age + curweights = gridweights[gmods] + if np.sum(curweights) > 0: + curweights /= np.sum(curweights) + cursim_indx = rangen.choice( + model_indx[gmods], size=nsim_curage, p=curweights + ) + + totsim_indx = np.concatenate((totsim_indx, cursim_indx)) + + nsim += nsim_curage + # totsimcurmass = np.sum(sedgrid["M_ini"][cursim_indx]) + # print(cage, totcurmass / totmass, simmass, totsimcurmass, nsim_curage) + + totsimmass = np.sum(sedgrid["M_ini"][totsim_indx]) + print(f"number total simulated stars = {nsim}; mass = {totsimmass}") + compl_choice = rangen.random(nsim) + compl_indx = model_compl[totsim_indx] >= compl_choice + sim_indx = totsim_indx[compl_indx] + totcompsimmass = np.sum(sedgrid["M_ini"][sim_indx]) + print( + f"number of simulated stars w/ completeness = {len(sim_indx)}; mass = {totcompsimmass}" + ) + + else: # total number of stars to simulate set by command line input + + if weight_to_use == "uniform": + # sample to get the indices of the picked models + sim_indx = rangen.choice(model_indx[goodobsmod], nsim) + + else: + gridweights = sedgrid[weight_to_use][goodobsmod] * model_compl[goodobsmod] + gridweights = gridweights / np.sum(gridweights) + + # sample to get the indexes of the picked models + sim_indx = rangen.choice(model_indx[goodobsmod], size=nsim, p=gridweights) + + print(f"number of simulated stars = {nsim}") + + # setup the output table + ot = Table() + qnames = list(sedgrid.keys()) + # simulated data + for k, filter in enumerate(sedgrid.filters): + simflux_wbias = flux[sim_indx, k] + model_bias[sim_indx, k] + + simflux = rangen.normal(loc=simflux_wbias, scale=model_unc[sim_indx, k]) + + bname = filter.split(sep="_")[-1].upper() + fluxname = f"{bname}_FLUX" + colname = f"{bname}_RATE" + magname = f"{bname}_VEGA" + ot[fluxname] = Column(simflux) + ot[colname] = Column(ot[fluxname] / vega_flux[k]) + pindxs = ot[colname] > 0.0 + nindxs = ot[colname] <= 0.0 + ot[magname] = Column(ot[colname]) + ot[magname][pindxs] = -2.5 * np.log10(ot[colname][pindxs]) + ot[magname][nindxs] = 99.999 + + # add in the physical model values in a form similar to + # the output simulated (physics+obs models) values + # useful if using the simulated data to interpolate ASTs + # (e.g. for MATCH) + fluxname = f"{bname}_INPUT_FLUX" + ratename = f"{bname}_INPUT_RATE" + magname = f"{bname}_INPUT_VEGA" + ot[fluxname] = Column(flux[sim_indx, k]) + ot[ratename] = Column(ot[fluxname] / vega_flux[k]) + pindxs = ot[ratename] > 0.0 + nindxs = ot[ratename] <= 0.0 + ot[magname] = Column(ot[ratename]) + ot[magname][pindxs] = -2.5 * np.log10(ot[ratename][pindxs]) + ot[magname][nindxs] = 99.999 + + # model parmaeters + for qname in qnames: + ot[qname] = Column(sedgrid[qname][sim_indx]) + + return ot + + +def simulate_obs( + physgrid_list, + noise_model_list, + output_catalog=None, + beastinfo_list=None, + mbmodel=None, + nsim=0, + compl_filter="max", + complcut=None, + magcut=None, + weight_to_use="weight", + ranseed=None, +): + """ + Simulate photometry based on a physicsmodel grid(s) and observation model(s). + + Parameters + ---------- + physgrid_list : list of strings + Name of the physics model file. If there are multiple physics model + grids (i.e., if there are subgrids), list them all here, and they will + each be sampled nsim/len(physgrid_list) times. + + noise_model_list : list of strings + Name of the noise model file. If there are multiple files for + physgrid_list (because of subgrids), list the noise model files + associated with each physics model file. + + output_catalog : string + Name of the output simulated photometry catalog + + beastinfo_list : list of strings (default=None) + Name of the beast info file. The mass and age prior models are read + from this model to use to compute the number of stars to simulate. If + there are multiple files for physgrid_list (because of subgrids), list + the beast info files associated with each physics model file. + Cannot be used at the same time as mbmodel. + + mbmodel: megabeast.MBModel + MegaBEAST model that provides a full ensemble physics model. + Cannot be used at the same time as the beastinfo_list. + + n_sim : int (default=100) + Number of simulated objects to create if beastinfo_list is not given. If + nsim/len(physgrid_list) isn't an integer, this will be increased so that + each grid has the same number of samples. + + compl_filter : str (default=max) + filter to use for completeness (required for toothpick model) + set to max to use the max value in all filters + + complcut : float (defualt=None) + completeness cut for only including model seds above the given + completeness, where the cut ranges from 0 to 1. + + magcut : float (defualt=None) + faint-end magnitude cut for only including model seds brighter + than the given magnitude in compl_filter. + + weight_to_use : string (default='weight') + Set to either 'weight' (prior+grid), 'prior_weight', 'grid_weight', + or 'uniform' (this option is valid only when nsim is supplied) to + choose the weighting for SED selection. + + ranseed : int + seed for random number generator + """ + if (beastinfo_list is not None) & (mbmodel is not None): + raise Exception("beastinfo_list and mbmodel cannot both be set") + + # numbers of samples to do + # (ensure there are enough for even sampling of multiple model grids) + n_phys = len(np.atleast_1d(physgrid_list)) + nsim = int(nsim) + samples_per_grid = int(np.ceil(nsim / n_phys)) + + if complcut is not None: + complcut = float(complcut) + + if magcut is not None: + magcut = float(magcut) + + if ranseed is not None: + ranseed = int(ranseed) + + # list to hold all simulation tables + simtable_list = [] + + # make a table for each physics model + noise model + for k, physgrid in enumerate(np.atleast_1d(physgrid_list)): + noise_model = np.atleast_1d(noise_model_list)[k] + + # get the physics model grid - includes priors + modelsedgrid = SEDGrid(str(physgrid)) + + # read in the noise model - includes bias, unc, and completeness + noisegrid = noisemodel.get_noisemodelcat(str(noise_model)) + + if beastinfo_list is not None: + with asdf.open(np.atleast_1d(beastinfo_list)[k]) as af: + binfo = af.tree + age_prior_model = binfo["age_prior_model"] + mass_prior_model = binfo["mass_prior_model"] + else: + age_prior_model = None + mass_prior_model = None + + # update the prior_weights if mbmodel is set + if mbmodel is not None: + + if nsim > 0: + age_prior_model = None + mass_prior_model = None + else: + age_prior_model = (mbmodel.physics_model["logA"],) + mass_prior_model = (mbmodel.physics_model["M_ini"],) + + # mock up a beast spectral grid so that the BEAST code can be used + cur_physmod = Table() + for cparam in ["logA", "M_ini", "Z", "distance", "Av", "Rv", "f_A"]: + cur_physmod[cparam] = modelsedgrid.grid[cparam] + n_mods = len(cur_physmod["logA"]) + for cparam in ["grid_weight", "prior_weight", "weight"]: + cur_physmod[cparam] = np.full(n_mods, 1.0) + + print("computing megabeast ensemble physics model (replaces beast priors)") + + dust_prior = compute_av_rv_fA_prior_weights( + cur_physmod["Av"], + cur_physmod["Rv"], + cur_physmod["f_A"], + cur_physmod["distance"], + av_prior_model=mbmodel.physics_model["Av"], + rv_prior_model=mbmodel.physics_model["Rv"], + fA_prior_model=mbmodel.physics_model["f_A"], + ) + + compute_distance_age_mass_metallicity_weights( + cur_physmod, + distance_prior_model={"name": "flat"}, + age_prior_model=mbmodel.physics_model["logA"]["model"], + mass_prior_model=mbmodel.physics_model["M_ini"]["model"], + met_prior_model={"name": "flat"}, + ) + + cur_physmod["prior_weight"] *= dust_prior + + modelsedgrid.grid["prior_weight"] = cur_physmod["prior_weight"].data + modelsedgrid.grid["grid_weight"] = cur_physmod["grid_weight"].data + modelsedgrid.grid["weight"] = ( + modelsedgrid.grid["prior_weight"] * modelsedgrid.grid["grid_weight"] + ) + + # generate the table + simtable = gen_SimObs_from_sedgrid( + modelsedgrid, + noisegrid, + age_prior_model=age_prior_model, + mass_prior_model=mass_prior_model, + nsim=samples_per_grid, + compl_filter=compl_filter, + complcut=complcut, + magcut=magcut, + weight_to_use=weight_to_use, + ranseed=ranseed, + ) + + # append to the list + simtable_list.append(simtable) + + # stack all the tables into one and write it out + fincat = vstack(simtable_list) + if output_catalog is not None: + fincat.write(output_catalog, overwrite=True) + + return fincat + + +def main(): + # commandline parser + parser = argparse.ArgumentParser() + + parser.add_argument( + "--physgrid_list", + "-p", + metavar="PHYS_MODEL", + required=True, + nargs="+", + help="filename(s) of physics grid(s)", + ) + parser.add_argument( + "--noise_model_list", + "-n", + metavar="NOISE_MODEL", + required=True, + nargs="+", + help="filename(s) of observation/noise grid(s)", + ) + parser.add_argument( + "--output_catalog", + "-c", + required=True, + help="filename for simulated observations", + ) + parser.add_argument( + "--beastinfo_list", + metavar="BEAST_INFO", + nargs="+", + help="filename(s) of beast info file(s)", + ) + parser.add_argument("--ensembleparams", help="ensemble parameters file") + parser.add_argument( + "--nsim", default=0, type=int, help="number of simulated objects" + ) + parser.add_argument( + "--compl_filter", + default="F475W", + help="filter for completeness, set to max for max of values from all filters", + ) + parser.add_argument( + "--complcut", + default=None, + type=float, + help="completeness cut for selecting seds above the completeness cut", + ) + parser.add_argument( + "--magcut", + default=None, + type=float, + help="magnitdue cut for selecting seds brighter than the magcut in compl_filter", + ) + parser.add_argument( + "--weight_to_use", + default="weight", + type=str, + help="""Set to either 'weight' (prior+grid), 'prior_weight', or + 'grid_weight' to choose the weighting for SED selection.""", + ) + parser.add_argument( + "--ranseed", default=None, type=int, help="seed for random number generator" + ) + args = parser.parse_args() + + # read in the megabeast model if supplied + if args.mbmodel: + params = read_mbmodel(args.mbmodel) + mbmodel = MBModel(params.stellar_model, params.dust_model) + else: + mbmodel = None + + # run observation simulator + simulate_obs( + args.physgrid_list, + args.noise_model_list, + output_catalog=args.output_catalog, + beastinfo_list=args.beastinfo_list, + mbmodel=mbmodel, + nsim=args.nsim, + compl_filter=args.compl_filter, + complcut=args.complcut, + magcut=args.magcut, + weight_to_use=args.weight_to_use, + ranseed=args.ranseed, + ) + + +if __name__ == "__main__": # pragma: no cover + + main() diff --git a/setup.cfg b/setup.cfg index 2c1de7d..ddfd804 100644 --- a/setup.cfg +++ b/setup.cfg @@ -45,6 +45,7 @@ addopts = --doctest-rst [options.entry_points] console_scripts = mb_fit_single = megabeast.fit_single:main + mb_simulate_obs = megabeast.tools.simulate_obs:main [coverage:run] omit =