forked from jmrosinski/GPTL
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
144 lines (142 loc) · 7.21 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
gptl4_0: When THREADED_PTHREADS is enabled, mutex locking only happens
when a new thread is found instead of on each start/stop
call. Can improve efficiency dramatically (on AIX anyway).
New entries start_handle and stop_handle added to improve
efficiency in fine-grained regions.
Moved threadutil.c contents back inside gptl.c for efficiency.
Can now call GPTLinitialize after GPTLfinalize.
Bugfix for GPTLstart* routines: Ensure don't read more than
strlen(name) characters from input name.
Added support for Cray compilers crayftn and craycc.
Bugfix for add: bump tout->count even when wallstats disabled.
Removed obsolete utr_rtc() code and ifdefs.
Added AIX-based read_real_time() function.
Added "volatile" attribute in appropriate places.
Cut tablesize to 1024, and added option to set it via GPTLsetoption
Added diagnostic print about how GPTL was built.
Bugfix for diagnostic prints when nanotime is true.
Print of memusage stats no longer based on dopr_collision.
Added to ./suggestions (Run autoconf, move configure to
suggestions, then set "silent=yes" in the script.
Added checks to MPI returns in MPIpr_summary.
gptl3_6_2: print_memusage converts to MB by default (if possible).
Added print_memusage test to testbasics.F90
Added macros.make.bluegene (tested at ORNL)
Bugfix for GPTLpr_summary: slave tried to receive too much data.
Bugfix for process_namelist.F90: Needs to close opened file
on error.
Bugfix for gptl_papilibraryinit (Fortran): returns an int.
Changed LINUX ifdef to HAVE_SLASHPROC. Not all Linux systems
have /proc/pid/statm.
gptl3_6_1: Bugfix for auto-profiling MPI_Recv: it could hang in gptl3_6.
Auto-profiling support for additional MPI routines.
Better estimates of bytes transfer for some auto-profiled
MPI routines.
gptl3_6: Support for auto-profiling some MPI routines.
gptl3_5_3: Makefile simplification. Can now run "make" from ctests/ and
ftests/. Initial set of PMPI wrappers, and synchronize option.
gptl3_5_2: Bugfix for when omp_get_max_threads() returns zero.
gptl3_5_1: No more OMP pragmas: PAPI events started from get_thread_num().
Thus GPTL built with PTHREADS=yes now works when called from
OpenMP applications.
gptl3_4_7: Option to not run summary at all.
gptl3_4_6: totent counts collisions not entries.
Add target "testnompi" to only run tests that don't require MPI.
gptl3_4_5: Add realloc memory checks. Fix typo in uninstall target.
Bugfix for instrumented codes when orphan encounters
parent. Print additional hash table stats. Decrease MAX_AUX
from 16 to 8 to save memory.
gptl3_4_4: Added GPTLevent_name_to_code and GPTLevent_code_to_name. Add
some derived events. More description to print.
gptl3_4_2: Enable "make test". Error call to stop_instr won't invoke null
pointer to function.
gptl3_4_1: Fix tests for robustness. Move initialized check in GPTLstop
higher to avoid one of the error tests in ftests/errtest
gptl3_4: Put back GPTLpr_summary. Some people were using it.
Get ctests/ and ftests/ into a usable state.
Proper handling of input defines by Fortran.
gptl3_2: Handle multiple parents
Add parsegptlout.pl for handling MPI output
Add capability for derived counters (currently just IPC and CI)
Add GPTL_PAPIlibraryinit to allow enabling PAPI native events
Junk autoconf in favor of macros.make ("suggestions" retained as
an autoconf-generated script to suggest settings)
Vastly faster start/stop for auto-instrumented codes
hex2name.pl properly indents output
Efficiency enhancements (e.g. no character copying on start/stop)
Use stack to determine parent, for simplicity, efficiency, correctness
Arrange timers in tree for proper handling of parent/child
gptl2_16: Rename GPTLpr_mpistats to GPTLpr_summary. Make it behave
reasonably even when MPI not enabled.
Close print files after writing to them.
gptl2_14: Add GPTLpr_mpistats entry point.
Check string starts with "PAPI_" before lopping off.
Remove "inline" attribute to timing functions--they're probably
not inlined anyway.
gptl2_13: verbose = false by default. Wrap informational printout in
test on "verbose". Add verbose option to gptl_papi.c
Modified behavior of scale.c
gptl2_11: Man pages. Default narrowprint to true. Fix minor formatting
bug in GPTLpr.
gptl2_10: Add GPTLpercent (of 1st timers[0]), and GPTLpersec (for PAPI).
Add timingModule.F90
gptl2_9: Update camgptl.inc to match gptl.inc (forgot in gptl2_8)
gptl2_8: Add option to retain parent/child relationships for printing.
Change behavior of ambiguous timers to put a "*" in column 1.
gptl2_7: Add GPTLget_nregions and GPTLget_regionname
gptl2_6: Add GPTLquerycounters (C and Fortran)
gptl2_5: Enable native events
gptl2_3: Minor changes to papiomptest.F
gptl2_2: Multiplexing: can enable, disable, or always enable.
GPTLnarrowprint (fewer digits)
Add cmd-line PAPI option arg to utrtest.
papiomptest no longer does malloc: caused trouble on SGI
gptl1_7: Split tests/ into ctests/ and ftests.
Fix logic on Fortran underscoring.
Return error when selected PAPI option not available.
Further fix (1.6 was not complete) on PAPI overhead.
gptl1_6: - Fix (bogus) print occurring from PAPI when overheadstats disabled.
- Add get_memusage and memusage test.
- Add "basic" test.
- Hack aclocal.m4 for openMP flags.
gptl1_5: - Fixed "install" target.
- Add NANOTIME (Linux only) option.
- Replace gettimeofday with generic utr (underlying timing routine)
option.
- Remove DIAG ifdef option.
- Interpret nested "start" calls as recursion.
- In NUMERIC mode, shift tag by 2 bits.
gptl1_4: - Added NUMERIC_TIMERS (configure --enable-numeric) for speedup.
If enabled, user args to start/stop MUST be numeric tags rather
than character strings. Intended primarily to be used with
automatic code instrumentation (g++ -finstrument-functions),
and tests/jr-resolve.pl to back out a name.
- Added "numeric" to tests/ in order to test --enable-numeric.
- Remove HASH as an option (it is now assumed) for readability.
- Moved some OMP code from threadutil.c to gptl.c strictly for
efficiency (inlining). Hated to do it because the code really
doesn't belong there.
- Added jr-resolve.pl to tests/ to back out symbol name from
address.
gpt1_2: Changed GPT to GPTL to avoid overlap with other utilities named GPT...
gpt1_1: Added README and man pages. Better commenting. Fixed bug in min
calcs in GPTstop.
gpt1_0: Added PAPI support. For whatever reason, on IBM get negative
increments occasionally.
gpt0_3: Added overflow diagnostic novfl
Migrate threading ops to threadutil.c
Remove useless calcs from GPTstop
Added iteration option to depth test
overhead test is now a C code and tests potentially expensive
calcs embedded in the library
Removed exp, expensive, timetimeing tests
Reenabled pthreads
Fix SGI complaints about arg lists
Changed depth array to mitigate false sharing
Added DISABLE_TIMERS ifdef
autoconf bugfix
Change microsecond accumulator test from 1000000 to 10000000
gpt0_2: Remove unused components in Timer. Make "name" component a fixed
size for efficiency.
gpt0_1: Add hashtable for efficiency. Add some more tests. Add ChangeLog.
Bugfix for printing.