https://github.com/CNugteren/CLTune
Raw File
Tip revision: 8a56a4a314be7ccef56ad8f55e8a34a37dda0545 authored by Cedric Nugteren on 12 December 2022, 08:07:03 UTC
Merge pull request #60 from trixirt/for-fedora
Tip revision: 8a56a4a
CHANGELOG

Next version (work in progress)
- Global and local thread size dividers now perform a ceiled division by default
- Verbose mode now always prints the parameter configuration before compiling and running
- Added additional OpenCL information printing to screen and to JSON

Version 2.7.0
- CLTune now automatically ensures global size is a multiple of the local workgroup size
- Added GetBestResult() to the tuner's API to retrieve the best parameters programmatically
- Changed std::initalizer_list in the AddParameters API to std::vector
- Fixed a bug in the simulated annealing search method

Version 2.6.0
- Changed timing measurements to now also include the (varying) kernel launch overhead
- It is now possible to set OpenCL compiler options through the env variable CLTUNE_BUILD_OPTIONS
- Added support for compilation under Visual Studio 2013 (MSVC++ 12.0)
- Added an option to build a static version of the library

Version 2.5.0
- Updated to version 8.0 of the CLCudaAPI header
- Made it possible to configure the number of times each kernel is run (to average results)

Version 2.4.0
- Made it possible to run the unit-tests independently of the provided OpenCL kernel samples
- Added an option to compile in verbose mode for additional diagnostic messages (-DVERBOSE=ON)
- Now using version 6.0 of the CLCudaAPI header
- Fixed the RPATH settings on OSX
- Added Appveyor continuous integration and increased coverage of the Travis builds

Version 2.3.1
- Fixed a bug where an output buffer could not be used as input at the same time
- Fixed computing the validation error for half-precision fp16 data-types

Version 2.3.0
- Added support for 'short' and 'cl_half' data-types as kernel buffer and scalar arguments
- Fixed a bug where failed results would still show up in the tuning results
- Made MSVC link the run-time libraries statically

Version 2.2.0
- Added two new simpler samples of using the tuner (vector-add and convolution)
- Updated the general documentation
- Added API documentation
- Now using version 5.0 of the CLCudaAPI header

Version 2.1.0
- Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
- Added command-line OpenCL platform selection in the examples (thanks to William J Shipman)

Version 2.0.0
- Added support for machine learning models. These models can be trained on a small fraction of the
  tuning configurations and can be used to predict the remainder. Two models are supported:
  * Linear regression
  * A 3-layer neural network
- Now using version 4.0 of the CLCudaAPI header (previously known as Claduc)
- Added experimental support for CUDA kernels
- Added support for MSVC (Visual Studio) 2015
- Using Catch instead of GTest for unit-testing
- Various minor fixes

Version 1.7.1
- Added additional device properties to JSON-output

Version 1.7.0
- Now using the Claduc C++11 interface to OpenCL
- Added a method to print all tuning results in JSON-format to file

Version 1.6.4
- Reduced the requirements from GCC 4.8.0 to 4.7.0
- Fixes various warnings on Clang

Version 1.6.3
- Reduced the requirements from GCC 4.9.0 to 4.8.0
- Minor updates to the CMake file

Version 1.6.2
- Fixed another exception-related bug
- Further improved reporting of failed runs
- Updated C++11 OpenCL API

Version 1.6.1
- Fixed a couple of issues related to exceptions
- Improved reporting of failed runs

Version 1.6.0
- Much cleaner API due to Pimpl idiom: only cltune.h header is now required
- Replaced Khronos' cl.hpp with a custom C++11 version tailored for CLTune
- Code clean-up / reorganisation
- Added an option to add fixed defines to reference kernels
- Added an option to load a kernel from string instead of from file
- Added support for size_t OpenCL buffers

Version 1.5.1
- Improved the GEMM example to support the Intel MIC (Xeon Phi) accelerators
- Updated compiler check and compiler flags
- Adds support for multiple OpenCL kernel files at once (e.g. when wanting to include a header file)
- Adds support for the std::complex data-types
- Fixed some compilation warnings regarding size_t conversions
- Updated the FindOpenCL.cmake file

Version 1.5.0
- OpenCL local work size and memory size constraints are now automatically handled
- Greatly improved the new 2D convolution example:
  * Filter coefficients are now dynamic
  * Added support for local memory padding
  * In-lined the convolution header into the kernels and host code
  * Fixed various bugs
- Moved the examples to separate subfolders
- Uses chrono timers as seed in favor of random device
- Bugfix for simulated annealing when 2 variables can only change together.

Version 1.4.1
- Added 2D convolution as an example
- Added command-line arguments to the GEMM search-method sample
- Fixed a CUDA 7 related bug in the GEMM kernel
- Fixed a logging bug in the PSO search technique

Version 1.4.0
- Added the particle swarm optimisation (PSO) search technique
- Updated the example GEMM kernel

Version 1.3.2
- Now prints OpenCL version when running on a device
- Added install targets to CMake
- Moved header files around and renamed the main include to "cltune.h"
- Catches OpenCL exceptions and skips those configurations

Version 1.3.1
- Fixed simulated annealing's random number generation
- Added new FindOpenCL CMake script
- Added option to print database-formatted output of best results

Version 1.3.0
- Allow users to select a search strategy through the API
- Added support for the simulated annealing search method
- Added a sample using simulated annealing
- Added an option to output the search process to file

Version 1.2.0
- Added the interface to customize search algorithms
- Initially added full-search and random-search as basic search algorithms

Version 1.1.0
- User-defined parameter constraints are now fully customizable by accepting arbitrary functions on
  an arbitrary combination of parameters.
- Re-factored the code to use more C++11 features: auto, smart pointers, constexpr, class enums, ...

Version 1.0.1
- Replaced one more occurrence of a pointer with an std::shared_ptr
- Re-added OpenCL class constructor exception test
- Updated license information

Version 1.0.0
- Initial release to GitHub
back to top