-
Notifications
You must be signed in to change notification settings - Fork 2
/
CHANGES.txt
68 lines (52 loc) · 3.18 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
===============================================================================
NEW IN VERSION 4
===============================================================================
- Changed the initialization scheme slightly to avoid integer rounding issues.
No change in performance has resulted this in my testing.
===============================================================================
NEW IN VERSION 3
===============================================================================
- Altered some of the defaults and added new variables. Applies to both
cpu/Phi and GPU versions. This change increases the data structure size
by about 35% to 36 MB total.
- Now user variables are: 2D source regions, coarse axial intervals,
fine axial intervals, and the axial deomposition
- 2D source regions: (5000 default) representative of the number of 2D source
regions that would be present on a single node out of ~5000 nodes running a
full reactor problem
- coarse axial intervals: (27 default) Number of coarse axial intervals
for a full assembly in the reactor (i.e., not yet domain decomposed)
- fine axial intervals: (5 default) number of intervals per coarse
axial interval
- Axial Decomposition: (20 default) number of nodes that will be used
to handle a single assembly
- These variables are then used to calculated the number of 3D source regions,
as 2D_sources * coarse axial intervals / axial decomposition
===============================================================================
NEW IN VERSION 2
===============================================================================
- CUDA code has been optimized significantly.
- Default inputs for all version now use 128 energy groups (rather than the
prior 100 energy groups). This is done as 128 is a multiple of 32, making
for more efficient use of SIMD vectors. This is an acceptable change because
a moderate increase in energy groups increases the fidelity of a real world
calculation and would not cost an unnecessarily large increase in wall time
given the potential gains in SIMD vector efficiency.
- Some documentation has been cleaned up.
- Added the option in the CUDA code to vary the number of segments per CUDA
block. This can be used to test the most efficient configuration for a
particular system. This option is altered using the "-p" command line option
on the CUDA version (defaults to 100).
- Exponential Table lookup has been disabled by default. It was found that
on most architectures, it's actually faster to just do the EXP call than
it is to have to read from memory.
===============================================================================
NEW IN VERSION 1
===============================================================================
- The code has been ported to CUDA. The repository is now split up into two
source folders - "cpu" and "cuda", which contain the CPU/Phi optimized code
and GPU CUDA code respectively.
- A flat branch of the repo is also available. This branch has more GPU
friendly data structures that should make the repo easier to port to
accelerator languages such as OCCA or OpenACC.
===============================================================================