Improved gcorr iteration script (#31)

Refactor gcorr script to work locally or submitted online, iterate until convergence, and condense outputs into a single directory with easily identifiable intermediate iteration files. Closes #21 --------- Co-authored-by: Alexandra Rahlin <[email protected]> Co-authored-by: Sasha Rahlin <[email protected]>
SPIDERCMB · Nov 20, 2023 · 9e80a9e · 9e80a9e
1 parent 3804f0b
commit 9e80a9e
Show file tree

Hide file tree

Showing 8 changed files with 897 additions and 544 deletions.
diff --git a/scripts/gcorr/README b/scripts/gcorr/README
@@ -2,79 +2,71 @@ Code package for calculating correction factors to g_ell using pre-generated
 simulation maps for each single mask. This procedure assumes you are running
 this code on a cluster-- it is extremely slow to do otherwise.
 
-There are three scripts used:
+There is one main script that calls two functions from the xfaster.gcorr_tools
+library:
 
-1. run_xfaster.py ---- This calls XFaster either to run or submit jobs.
-The only part that you'll need to touch is at the beginning-- opts and
-submit_opts, to match the options you use for data/for your cluster.
-2. compute_gcorr.py  ---- A script that computes gcorr.npz file from all the
-bandpower.npz files. You'll never need to touch or run this script.
-3. iterate.py ---- A iteration script, that calls script 1 and 2 to get a
-new gcorr.npz file each time. This is the main script you'll run.
+1. xfaster_gcorr ---- This function calls XFaster to run or submit jobs for
+   gcorr runs.
+2. process_gcorr ---- A function that computes the gcorr correction from the
+   ensemble of bandpowers, updates the running total, and backs up the necessary
+   files from each iteration.
+3. iterate_gcorr.py ---- A iteration script, that calls function 1 and 2 to get
+   a new gcorr.npz file each time. This is the main script you'll run.
 
 There is also a config file with options specific to computing gcorr.
 
 ---------------------
 Gcorr run procedure:
 
 1. Edit the gcorr config file to suit your purposes. An example is given.
-Required fields are:
- * null -- must be true for null tests and false for signal runs
- * map_tags -- comma-separated list of map tags
- * data_subset -- the globabble data_subset argument to xfaster_run,
-     but without map tags. So, "full", "chunk*", etc.
- * output_root -- the parent directory where your gcorr XFaster runs will
-     be written
- * nsim -- the number of simulations to use to compute gcorr
- * [xfaster_opts] -- this is where you'll put any string-type options
-     that will be directly input to xfaster_run
+   Required fields are:
+   * null -- must be true for null tests and false for signal runs
+   * map_tags -- comma-separated list of map tags
+   * data_subset -- the globabble data_subset argument to xfaster_run,
+       but without map tags. So, "full", "chunk*", etc.
+   * output_root -- the parent directory where your gcorr XFaster runs will
+       be written
+   * nsim -- the number of simulations to use to compute gcorr
+   * [xfaster_opts] -- this is where you'll put any options
+       that will be directly input to xfaster_run
+   * [submit_opts] -- this is where you'll put any options
+       that will be directly input to xfaster_submit,
+       in addition to those in [xfaster_opts]
 
-2. Edit the beginning of run_xfaster.py for non-string input options
-  to xfaster_run (opts dictionary) or xfaster_submit (submit_opts).
-  Here you might change things like lmin, lmax, bin size, etc. and
-  omp_threads.
+2. Run iterate_gcorr.py once to get the full set of XFaster output files in the
+   output directory.  Since we haven't computed gcorr yet, this will set
+   apply_gcorr=False. Make sure to use as many OMP threads as possible since
+   this is the step where the sims_xcorr file, which benefits the most from
+   extra threads, is computed.  Your command should look like this:
 
-3. Run run_xfaster.py once to get the full set of XFaster output files.
-  Since we haven't computed gcorr yet, you must use --no-gcorr. Make sure
-  to use as many OMP threads as possible since this is the step where the
-  sims_xcorr file, which benefits the most from extra threads, is computed.
-  Your command should look like this:
-  python run_xfaster.py --gcorr-config path-to-my-gcorr-config.ini --no-gcorr
+      python iterate_gcorr.py path-to-my-gcorr-config.ini
 
-4. Run iterate.py until convergence is reached. In practice, you will do:
-  iterate.py --gcorr-config path-to-my-gcorr-config.ini
-  then wait for it to finish. Then look at the correction-to-the-correction
-  that it both prints and plots (it should converge to 1s for TT, EE, BB),
-  and if it hasn't converged, up+enter (redo) the same command you just did.
-  In much more detail, here's what the code does:
-  1. If this is the first iteration, copy the whole output directory into
-     one next to it with tag _iter. This is the directory that will now be
-     updated with new transfer functions and bandpowers on each iteration.
-     In the code, it's called rundir.
-  2. Make a plot directory in that _iter directory-- look here for new plots
-     of the total gcorr and the correction-to-gcorr each iteration.
-  3. For the first iteration, initialize a starting guess for gcorr as all
-     ones. This total gcorr is saved as gcorr_<tag>_total.npz in the original
-     (reference) output directory.
-  4. If not the first iteration, load up the correction-to-gcorr computed
-     in the previous iteration. Multiply it by the total gcorr, and save that
-     to the reference directory as gcorr_total. Also save the previous
-     iteration's total gcorr as gcorr_<tag>_prev.npz.
-  5. Plot gcorr total and the correction to gcorr total. Save in rundir/plots.
-  6. Clear out rundir bandpowers/transfer functions/logs.
-  7. Call run_xfaster.py for the 0th sim seed while also reloading gcorr.
-     This does a couple things-- saves the new gcorr in the masks_xcorr file,
-     so later seeds will use the right thing. And recompute the transfer
-     function, which doesn't depend on the sim_index, so is only necessary to
-     do once.
-  8. After the transfer functions are all on disk, submit individual jobs for
-     all the other seeds, just doing the bandpowers step for those.
-  9. Monitor the queue, checking every 10 seconds for jobs still running.
-  10. Once they're all done, run compute_gcal.py, which saves a
-     correction-to-gcorr as gcorr_corr_<tag>.npz in the rundir.
-  11. Print out the values of the correction-to-gcorr.
-  12. Exit.
+3. Run iterate_gcorr.py until convergence is reached. In practice, you will run
+   the command above and wait for it to finish. If you include the `--max-iters`
+   option with a non-zero value, the code will try to determine whether
+   convergence or max_iters has been reached and stop on its own.  Otherwise,
+   you can look at the correction-to-the-correction that it both prints and
+   plots (it should converge to 1s for TT, EE, BB), and if it hasn't converged,
+   up+enter (redo) the same command you just did.  In much more detail, here's
+   what the code does:
 
-5. After convergence is reached, copy the gcorr_total file from the refdir
-  to the mask directory, labeling it mask_map_<tag>_gcorr.npz for signal or
-  mask_map_<tag>_gcorr_null.npz for null.
+   1. Call xfaster_gcorr for the 0th sim seed while also reloading gcorr (if
+      this is not the first iteration).  This does a couple things-- saves the
+      new gcorr in the masks_xcorr file, so later seeds will use the right
+      thing. And recompute the transfer function, which doesn't depend on the
+      sim_index, so is only necessary to do once.
+   2. After the transfer functions are all on disk, submit individual jobs for
+      all the other seeds, just doing the bandpowers step for those.
+   3. Once they're all done, run compute_gcal, and save a correction-to-gcorr
+      as gcorr_corr_<tag>_iter<iter>.npz in the rundir.
+   4. If not the first iteration, load up the correction-to-gcorr computed for
+      this iteration. Multiply it by the total gcorr, and save that to the
+      output directory as gcorr_total_<tag>_iter<iter>.npz.
+   5. Plot gcorr total and the correction to gcorr total. Save in rundir/plots.
+   6. Clear out rundir bandpowers/transfer functions/logs.
+   7. Exit.
+
+4. After convergence is reached, copy the gcorr_total file for the last
+   iteration from the rundir to the mask directory, labeling it
+   mask_map_<tag>_gcorr.npz for signal or mask_map_<tag>_gcorr_null.npz for
+   null.
diff --git a/scripts/gcorr/compute_gcal.py b/scripts/gcorr/compute_gcal.py
diff --git a/scripts/gcorr/gcorr_config_null.ini b/scripts/gcorr/gcorr_config_null.ini
@@ -17,4 +17,20 @@ signal_type = synfast
 # Noise type needed for nulls
 noise_type = gaussian
 mask_type = rectangle
-verbose = debug
+likelihood = false
+residual_fit = false
+foreground_fit = false
+tbeb = true
+bin_width = 25
+lmin = 2
+lmax = 500
+verbose = info
+
+# Options for submitting to a cluster
+[submit_opts]
+nodes = 1
+ppn = 1
+mem = 6
+omp_threads = 10
+wallt = 4
+num_jobs = 10
diff --git a/scripts/gcorr/gcorr_config_signal.ini b/scripts/gcorr/gcorr_config_signal.ini
@@ -6,7 +6,6 @@ data_subset = full
 output_root = ../../example/gcorr_run
 nsim = 100
 
-
 # Options we can directly pass to XFaster
 [xfaster_opts]
 config = ../../example/config_example.ini
@@ -16,4 +15,20 @@ signal_type = synfast
 # noise type ignored for signal gcorr
 noise_type = gaussian
 mask_type = rectangle
-verbose = debug
+likelihood = false
+residual_fit = false
+foreground_fit = false
+tbeb = true
+bin_width = 25
+lmin = 2
+lmax = 500
+verbose = info
+
+# Options for submitting to a cluster
+[submit_opts]
+nodes = 1
+ppn = 1
+mem = 6
+omp_threads = 10
+wallt = 4
+num_jobs = 10