CHANGES.txt

===============================================================================
NEW IN VERSION 4
===============================================================================

- Changed the initialization scheme slightly to avoid integer rounding issues.
  No change in performance has resulted this in my testing.

===============================================================================
NEW IN VERSION 3
===============================================================================

- Altered some of the defaults and added new variables. Applies to both 
  cpu/Phi and GPU versions. This change increases the data structure size
  by about 35% to 36 MB total. 

- Now user variables are: 2D source regions, coarse axial intervals,
  fine axial intervals, and the axial deomposition

- 2D source regions: (5000 default) representative of the number of 2D source
  regions that would be present on a single node out of ~5000 nodes running a
  full reactor problem

- coarse axial intervals: (27 default) Number of coarse axial intervals
  for a full assembly in the reactor (i.e., not yet domain decomposed)

- fine axial intervals: (5 default) number of intervals per coarse
  axial interval

- Axial Decomposition: (20 default) number of nodes that will be used
  to handle a single assembly

- These variables are then used to calculated the number of 3D source regions,
  as 2D_sources * coarse axial intervals / axial decomposition

===============================================================================
NEW IN VERSION 2
===============================================================================
- CUDA code has been optimized significantly.

- Default inputs for all version now use 128 energy groups (rather than the
  prior 100 energy groups). This is done as 128 is a multiple of 32, making
  for more efficient use of SIMD vectors. This is an acceptable change because
  a moderate increase in energy groups increases the fidelity of a real world
  calculation and would not cost an unnecessarily large increase in wall time
  given the potential gains in SIMD vector efficiency.

- Some documentation has been cleaned up.

- Added the option in the CUDA code to vary the number of segments per CUDA
  block. This can be used to test the most efficient configuration for a 
  particular system. This option is altered using the "-p" command line option
  on the CUDA version (defaults to 100).

- Exponential Table lookup has been disabled by default. It was found that
  on most architectures, it's actually faster to just do the EXP call than
  it is to have to read from memory.

===============================================================================
NEW IN VERSION 1
===============================================================================
- The code has been ported to CUDA. The repository is now split up into two
  source folders - "cpu" and "cuda", which contain the CPU/Phi optimized code
  and GPU CUDA code respectively.

- A flat branch of the repo is also available. This branch has more GPU
  friendly data structures that should make the repo easier to port to
  accelerator languages such as OCCA or OpenACC.
===============================================================================