https://github.com/tareqmalas/girih
Tip revision: 0c126788937d189147be47115703b752235e585c authored by Tareq Malas on 18 August 2016, 17:16:46 UTC
fixed a minor typo
fixed a minor typo
Tip revision: 0c12678
README.md
Girih
============
This is a pre-alpha release. Contact me if you have issues or questions at tareq.m.malas at gmail.com
(Inspired from http://en.wikipedia.org/wiki/Girih)
#### Introduction
This tool serves as a test harness for different optimization techniques to
improve stencil computations performance in shared and distributed memory
systems. Spatial and temporal blocking techniques are implemented across all
implemented stencil operators. The tool is mainly used to develop and analyze
the performance of Multi-core Wavefront Diamond (MWD) tiling techniques, which
are used to perform temporal blocking. Naive, halo-first, and diamond tiling
options are implemented for MPI communication in distributed memory setup. The
project supports star stencil operators at different orders in space and time
and with constant/variable coefficients and different coefficients' symmetry
options. It provides means to verify the results' correctness of the optimized
stencil kernels using the results of unoptimized serial reference
implementations. Timing routines are inserted manually in the code to measure
the performance of the main components of the code, including computation,
communication, and idle time.
LIKWID performance tool is used to perform measurements through hardware
counters at regions of interest in the code. This enables the user to accurately
measure the memory bandwidth, energy consumption, TLB misses, and other
supported groups at LIKWID.
#### Installation & compilation
Use conf/make.conf file to set the Makefile variables to the
desired compiler binaries and flags. The main make targets are:
* dp: Used to set the problem data to double-precision.
* debug: Used for debugging and verification builds.
The make command creates build directory according to the selected precision for
the executable "mwd_kernels" and the object files. dp make target creates "build_dp"
directory and the default build creates "build" directory.
#### Usage
The list of available parameters can be printed by passing the argument --help.
To see all available stencil operators and optimization techniques, pass the
argument --list.
#### Examples
##### Run 7-point constant-coefficient stencil using a grid of size 512^3 and 500 time steps, using the relaxed-synchronzation wavefront MWD algorithm (Auto-tuning may take time, given that no tile size nor thread group size are selected)
export OMP_NUM_THREADS=12; ./build_dp/mwd_kernel --nx 512 --ny 512 --nz 512
--nt 500 --target-kernel 1 --mwd-type 2 --target-ts 2
#### Performance output summaries
Manual timing routines (using MPI_Wtime) are used to collect the time spent in:
1) Computation
2) Communication (including waiting time)
3) Waiting at MPI_Wait/MPI_Waitall and MPI_Barrier
4) Total time of the time stepper, including the waiting time at the barrier
that is used to synchronize with other MPI processes after completing the
time stepper work
5) Other parts of the code, where this value =total-computation-communication.
This value is useful to check if any significant time is spent elsewhere
Additional timing measurements are printed at the MWD implementation. They
include details about the time distribution of each thread and thread group
during the runtime.
A successful experiment prints out statistics about the timing results above.
The maximum, minimum, and mean values are printed along with Rank0 timing
results.
#### Helpful scripts
Several Python scripts (available under scripts/) are used to run experiments
for performance measurements and verification regression tests.
verification/run_verification.py 0 1 2
Runs a regression test over the standard and MWD implementations. It
runs a combination of topology sizes and domain sizes and other
parameters with verification
verification/run_short_verification_0to1.py
Runs a short version of the regression test over the naive and halo-first
implementations
verification/run_short_verification_intra_diamond.py
Runs a short version of the regression test over the MWD implementations
parse.py exper-file-1 exper-file-2 ...exper-file-n
This script takes multiple files, each contains the output of individual
experiment. The parsed results are written in CSV format in summary.csv
Experiment submission scripts
Most of the remaining scripts are used to run experiments at systems
with job scheduler (for example, Shaheen at KAUST). These scripts should
be executed from the root directory of the project
For questions and inquiries please contact tareq.m.malas@gmail.com