README

Paris/PPCG/Polyhedral/PRL Runtime Library (PRL)
===============================================

Compiling
----------
./autogen.sh
./configure
make

The user needs to provide the path to the OpenCL header files
and the path to the OpenCL library.  This can be done for example
by adding CFLAGS="-I<PATH_TO_OPENCL_HEADERS>  -L<PATH_TO_OPENCL_LIB>"
when calling configure.  For example:

./configure CFLAGS="-I/opt/AMDAPP/include/ -L/opt/AMDAPP/lib/x86_64/"

Usage
-----

Either use pencilcc/penciltool
 - or -
add -lprl_opencl to the linker line.  When compiling, add 'prl/include' to the header search path (or install them to the default header search path).

The library will initialize on its first use.


Profiling
---------

This version of the runtime library can profile PENCIL programs automatically.  Note that this implementation is not really prepared to be used from multiple threads.


### Ad-hoc Profiling (stats)

This can be enabled by setting the environment variables

	PRL_DUMP_CPU=1
	PRL_DUMP_GPU=1

	PRL_DUMP_ALL=1

sets both of them.  The will output statistics to stdout.

PRL_DUMP_CPU will print how long calls to the OpenCL API took on the CPU.  PRL_DUMP_GPU prints the durations of tasks on the GPU as reported by OpenCL itself.  It is printed as summary when the program ends. PRL_TRACE_GPU will print the duration of every OpenCL queue item.


### Benchmarking (timings)

Profiling a single function call can be unreliable do to noise and one-time effects.  Therefore there is also a mechanism to measure a piece of code multiple times.  The easiest way to do this is to create a new program that call the function

	prl_prof_benchmark(timed_func, user, init_callback, init_user, finit_callback, finit_user)

which does basically this:

	prl_prof_reset();

	for (auto i = 0; i < PRL_PROF_DRY_RUNS; ++i) {
		if (init_callback) (*init_callback)(init_user);
		(*timed_func)(user);
		if (finit_callback) (*finit_callback)(finit_user);
	}

	for (auto i = 0; i < PRL_PROF_RUNS; ++i) {
		if (init_callback) (*init_callback)(init_user);
		prl_prof_start();
		(*timed_func)(user);
		prl_prof_stop();
		if (finit_callback) (*finit_callback)(finit_user);
	}

	prl_prof_dump();


PRL_PROF_RUNS and PRL_PROF_DRY_RUNS are environment variables that can be set.  The defaults are 1 for dry runs and 10 for timed runs.  Dry runs allow eliminating first-time effect like cold memory cashes and compiling the OpenCL program.

The execution time between 'prl_prof_start()' and 'prl_prof_stop()' is measured.  Since it is executed multiple times, it records the time of every execution.  The data is printed to stdout on 'prl_prof_dump()'.  It prints the median times and relative standard deviation of all profiling items.

It might not be possible to use the 'prl_timings' function in you application.  In this case, one can do it manually by following the structure of the prl_timings snippet.