Skip to content

Commit

Permalink
Improved gcorr iteration script (#31)
Browse files Browse the repository at this point in the history
Refactor gcorr script to work locally or submitted online, iterate until
convergence, and condense outputs into a single directory with easily
identifiable intermediate iteration files.

Closes #21

---------

Co-authored-by: Alexandra Rahlin <[email protected]>
Co-authored-by: Sasha Rahlin <[email protected]>
  • Loading branch information
3 people authored Nov 20, 2023
1 parent 3804f0b commit 9e80a9e
Show file tree
Hide file tree
Showing 8 changed files with 897 additions and 544 deletions.
120 changes: 56 additions & 64 deletions scripts/gcorr/README
Original file line number Diff line number Diff line change
Expand Up @@ -2,79 +2,71 @@ Code package for calculating correction factors to g_ell using pre-generated
simulation maps for each single mask. This procedure assumes you are running
this code on a cluster-- it is extremely slow to do otherwise.

There are three scripts used:
There is one main script that calls two functions from the xfaster.gcorr_tools
library:

1. run_xfaster.py ---- This calls XFaster either to run or submit jobs.
The only part that you'll need to touch is at the beginning-- opts and
submit_opts, to match the options you use for data/for your cluster.
2. compute_gcorr.py ---- A script that computes gcorr.npz file from all the
bandpower.npz files. You'll never need to touch or run this script.
3. iterate.py ---- A iteration script, that calls script 1 and 2 to get a
new gcorr.npz file each time. This is the main script you'll run.
1. xfaster_gcorr ---- This function calls XFaster to run or submit jobs for
gcorr runs.
2. process_gcorr ---- A function that computes the gcorr correction from the
ensemble of bandpowers, updates the running total, and backs up the necessary
files from each iteration.
3. iterate_gcorr.py ---- A iteration script, that calls function 1 and 2 to get
a new gcorr.npz file each time. This is the main script you'll run.

There is also a config file with options specific to computing gcorr.

---------------------
Gcorr run procedure:

1. Edit the gcorr config file to suit your purposes. An example is given.
Required fields are:
* null -- must be true for null tests and false for signal runs
* map_tags -- comma-separated list of map tags
* data_subset -- the globabble data_subset argument to xfaster_run,
but without map tags. So, "full", "chunk*", etc.
* output_root -- the parent directory where your gcorr XFaster runs will
be written
* nsim -- the number of simulations to use to compute gcorr
* [xfaster_opts] -- this is where you'll put any string-type options
that will be directly input to xfaster_run
Required fields are:
* null -- must be true for null tests and false for signal runs
* map_tags -- comma-separated list of map tags
* data_subset -- the globabble data_subset argument to xfaster_run,
but without map tags. So, "full", "chunk*", etc.
* output_root -- the parent directory where your gcorr XFaster runs will
be written
* nsim -- the number of simulations to use to compute gcorr
* [xfaster_opts] -- this is where you'll put any options
that will be directly input to xfaster_run
* [submit_opts] -- this is where you'll put any options
that will be directly input to xfaster_submit,
in addition to those in [xfaster_opts]

2. Edit the beginning of run_xfaster.py for non-string input options
to xfaster_run (opts dictionary) or xfaster_submit (submit_opts).
Here you might change things like lmin, lmax, bin size, etc. and
omp_threads.
2. Run iterate_gcorr.py once to get the full set of XFaster output files in the
output directory. Since we haven't computed gcorr yet, this will set
apply_gcorr=False. Make sure to use as many OMP threads as possible since
this is the step where the sims_xcorr file, which benefits the most from
extra threads, is computed. Your command should look like this:

3. Run run_xfaster.py once to get the full set of XFaster output files.
Since we haven't computed gcorr yet, you must use --no-gcorr. Make sure
to use as many OMP threads as possible since this is the step where the
sims_xcorr file, which benefits the most from extra threads, is computed.
Your command should look like this:
python run_xfaster.py --gcorr-config path-to-my-gcorr-config.ini --no-gcorr
python iterate_gcorr.py path-to-my-gcorr-config.ini

4. Run iterate.py until convergence is reached. In practice, you will do:
iterate.py --gcorr-config path-to-my-gcorr-config.ini
then wait for it to finish. Then look at the correction-to-the-correction
that it both prints and plots (it should converge to 1s for TT, EE, BB),
and if it hasn't converged, up+enter (redo) the same command you just did.
In much more detail, here's what the code does:
1. If this is the first iteration, copy the whole output directory into
one next to it with tag _iter. This is the directory that will now be
updated with new transfer functions and bandpowers on each iteration.
In the code, it's called rundir.
2. Make a plot directory in that _iter directory-- look here for new plots
of the total gcorr and the correction-to-gcorr each iteration.
3. For the first iteration, initialize a starting guess for gcorr as all
ones. This total gcorr is saved as gcorr_<tag>_total.npz in the original
(reference) output directory.
4. If not the first iteration, load up the correction-to-gcorr computed
in the previous iteration. Multiply it by the total gcorr, and save that
to the reference directory as gcorr_total. Also save the previous
iteration's total gcorr as gcorr_<tag>_prev.npz.
5. Plot gcorr total and the correction to gcorr total. Save in rundir/plots.
6. Clear out rundir bandpowers/transfer functions/logs.
7. Call run_xfaster.py for the 0th sim seed while also reloading gcorr.
This does a couple things-- saves the new gcorr in the masks_xcorr file,
so later seeds will use the right thing. And recompute the transfer
function, which doesn't depend on the sim_index, so is only necessary to
do once.
8. After the transfer functions are all on disk, submit individual jobs for
all the other seeds, just doing the bandpowers step for those.
9. Monitor the queue, checking every 10 seconds for jobs still running.
10. Once they're all done, run compute_gcal.py, which saves a
correction-to-gcorr as gcorr_corr_<tag>.npz in the rundir.
11. Print out the values of the correction-to-gcorr.
12. Exit.
3. Run iterate_gcorr.py until convergence is reached. In practice, you will run
the command above and wait for it to finish. If you include the `--max-iters`
option with a non-zero value, the code will try to determine whether
convergence or max_iters has been reached and stop on its own. Otherwise,
you can look at the correction-to-the-correction that it both prints and
plots (it should converge to 1s for TT, EE, BB), and if it hasn't converged,
up+enter (redo) the same command you just did. In much more detail, here's
what the code does:

5. After convergence is reached, copy the gcorr_total file from the refdir
to the mask directory, labeling it mask_map_<tag>_gcorr.npz for signal or
mask_map_<tag>_gcorr_null.npz for null.
1. Call xfaster_gcorr for the 0th sim seed while also reloading gcorr (if
this is not the first iteration). This does a couple things-- saves the
new gcorr in the masks_xcorr file, so later seeds will use the right
thing. And recompute the transfer function, which doesn't depend on the
sim_index, so is only necessary to do once.
2. After the transfer functions are all on disk, submit individual jobs for
all the other seeds, just doing the bandpowers step for those.
3. Once they're all done, run compute_gcal, and save a correction-to-gcorr
as gcorr_corr_<tag>_iter<iter>.npz in the rundir.
4. If not the first iteration, load up the correction-to-gcorr computed for
this iteration. Multiply it by the total gcorr, and save that to the
output directory as gcorr_total_<tag>_iter<iter>.npz.
5. Plot gcorr total and the correction to gcorr total. Save in rundir/plots.
6. Clear out rundir bandpowers/transfer functions/logs.
7. Exit.

4. After convergence is reached, copy the gcorr_total file for the last
iteration from the rundir to the mask directory, labeling it
mask_map_<tag>_gcorr.npz for signal or mask_map_<tag>_gcorr_null.npz for
null.
145 changes: 0 additions & 145 deletions scripts/gcorr/compute_gcal.py

This file was deleted.

18 changes: 17 additions & 1 deletion scripts/gcorr/gcorr_config_null.ini
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,20 @@ signal_type = synfast
# Noise type needed for nulls
noise_type = gaussian
mask_type = rectangle
verbose = debug
likelihood = false
residual_fit = false
foreground_fit = false
tbeb = true
bin_width = 25
lmin = 2
lmax = 500
verbose = info

# Options for submitting to a cluster
[submit_opts]
nodes = 1
ppn = 1
mem = 6
omp_threads = 10
wallt = 4
num_jobs = 10
19 changes: 17 additions & 2 deletions scripts/gcorr/gcorr_config_signal.ini
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ data_subset = full
output_root = ../../example/gcorr_run
nsim = 100


# Options we can directly pass to XFaster
[xfaster_opts]
config = ../../example/config_example.ini
Expand All @@ -16,4 +15,20 @@ signal_type = synfast
# noise type ignored for signal gcorr
noise_type = gaussian
mask_type = rectangle
verbose = debug
likelihood = false
residual_fit = false
foreground_fit = false
tbeb = true
bin_width = 25
lmin = 2
lmax = 500
verbose = info

# Options for submitting to a cluster
[submit_opts]
nodes = 1
ppn = 1
mem = 6
omp_threads = 10
wallt = 4
num_jobs = 10
Loading

0 comments on commit 9e80a9e

Please sign in to comment.