https://github.com/CNugteren/CLTune

sort by:
Revision Author Date Message Commit Date
8a56a4a Merge pull request #60 from trixirt/for-fedora For fedora 12 December 2022, 08:07:03 UTC
96918d3 Improve install location handling On Fedora the x86_64 library directory is lib64 on Ubuntu it is lib Include GNUInstallDirs and use its variables to set the install locations. Signed-off-by: Tom Rix <trix@redhat.com> 11 December 2022, 13:49:27 UTC
6bb4014 Silence OpenCL version warning Recent opencl headers cause this warning /usr/include/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)’ While the default of 300/3.0 works, set to the older 120/1.2 until there is a dependency on 3.0 features. Signed-off-by: Tom Rix <trix@redhat.com> 07 December 2022, 13:01:54 UTC
83c69e3 Version the shared library Applications that build with the shared library need to have a versioned shared library so cltune can be upgraded without risking runtime times with the upgrade. Signed-off-by: Tom Rix <trix@redhat.com> 06 December 2022, 20:12:28 UTC
0bbf787 Merge pull request #59 from trixirt/fix_numeric_limits include limits header 21 November 2022, 16:12:32 UTC
9d8b2d0 include limits header On Ubuntu 22.04 / g++ 11.3 there is this error CLTune/src/ml_model.cc:58:21: error: ‘numeric_limits’ is not a member of ‘std’ 58 | auto min = std::numeric_limits<T>::max(); | ^~~~~~~~~~~~~~ numeric_limits is defined in the limits header, so include it. Signed-off-by: Tom Rix <trix@redhat.com> 19 November 2022, 14:36:28 UTC
6167b30 Merge pull request #58 from afshinarefi/master Add explicit header for std::stringstream 04 February 2022, 08:05:32 UTC
5b38c67 Add explicit header for std::stringstream 03 February 2022, 15:47:03 UTC
6df6ac6 Merge pull request #56 from Knutakir/missing-cuda-functions Add missing functions for CUDA 17 June 2020, 18:20:04 UTC
022f4c4 Add missing functions for CUDA 16 June 2020, 16:03:31 UTC
be5f37a Fixed AppVeyor issues 19 November 2018, 19:53:19 UTC
050ab01 Merge pull request #53 from rtrembecky/master 2D convolution sample - fix correctness and redundant loads 19 November 2018, 19:50:14 UTC
63ba4db fix redundant loads 18 November 2018, 14:20:26 UTC
c5246f3 fix correctness - fix global offset for input data access 18 November 2018, 14:20:38 UTC
45078d9 fix correctness - fix input padded data initialization 18 November 2018, 14:20:26 UTC
7ca2c87 Added reference to KTT 30 August 2018, 19:42:20 UTC
06721eb Fixed an issue with the NVIDIA compute capability not being retrieved properly 16 September 2017, 16:29:48 UTC
1ba3595 Removed development builds from README 14 September 2017, 20:02:35 UTC
7c47f44 Removed development builds from README 14 September 2017, 20:01:25 UTC
e0484dc Added a guard against missing AMD and NVIDIA extensions 14 September 2017, 20:00:48 UTC
6d4b506 Added additional OpenCL information printing to screen and to JSON 10 September 2017, 13:39:18 UTC
7c5a580 Added printing of the parameter configuration in verbose mode; prints parmeters to stdout before compiling and running a kernel (with -DVERBOSE=ON only) 05 September 2017, 18:21:47 UTC
fc28b36 Global and local thread size dividers now perform a ceiled division by default 25 July 2017, 18:48:49 UTC
6b7c50b Merge branch 'development' 26 June 2017, 19:11:53 UTC
3c577cc Updated to version 2.7.0 26 June 2017, 19:11:34 UTC
3225b69 Fixed appveyor settings related to a recent change in the Khronos repo's 26 June 2017, 18:51:20 UTC
ba07c79 The tuner now automatically ensures global size is a multiple of the local size 26 June 2017, 18:44:04 UTC
2b49667 Fixed a bug in the annealing method and added some extra sanity checks 20 February 2017, 21:20:11 UTC
8eb3df1 Added a function to return the best tuning parameters to the user; factored out the logic to get the best tuning results 20 February 2017, 21:04:05 UTC
115e14b Changed std::initalizer_list in the AddParameters API to std::vector 17 February 2017, 20:22:24 UTC
35de111 Merge pull request #46 from CNugteren/development Update to version 2.6.0 23 October 2016, 13:41:14 UTC
dc1cb0b Updated to version 2.6.0 23 October 2016, 13:29:58 UTC
a8c6871 Added support for pkg-config installation on Linux 23 October 2016, 13:29:27 UTC
73ed6c3 Added an option to compile a static library 22 October 2016, 14:42:10 UTC
bdbf353 Fixed a const/constexpr issue caused by the previous commit 12 October 2016, 19:43:01 UTC
083a5e2 Added support for compilation under Visual Studio 2013 (MSVC++ 12.0) 12 October 2016, 19:36:20 UTC
0ed56a1 It is now possible to set the OpenCL compiler options through an environment variable 02 October 2016, 11:44:46 UTC
5219183 Execution time measurements is no longer based on events but uses CPU timers instead to also include the (varying) kernel launch time overhead and other overheads (if any) 02 October 2016, 11:38:45 UTC
d0ec5a1 Merge pull request #45 from CNugteren/development Update to version 2.5.0 27 September 2016, 19:04:58 UTC
82dd234 Updated to version 2.5.0 27 September 2016, 18:53:32 UTC
2a56722 Made the number of runs for averaging a setting configurable by the user 27 September 2016, 18:49:47 UTC
bb4ba83 Updated to version 8.0 of CLCudaAPI 27 September 2016, 18:48:10 UTC
492c362 Updated Travis CI to use the system OpenCL instead of compiling our own OpenCL library 03 August 2016, 18:21:59 UTC
68cb1d4 Updated to version 7.0 of the CLCudaAPI header 03 August 2016, 18:17:41 UTC
a6cb325 Merge pull request #44 from williamjshipman/development Fix bug in Kernel::LocalMemUsage on Intel CPU runtime 03 August 2016, 18:00:03 UTC
d8318a5 Fix bug in Kernel::LocalMemUsage where Intel CPU runtime returns a size of 0 if the in the first call to clGetKernelWorkGroupInfo. Cause seems to be an ambiguity in the OpenCL standard. 30 July 2016, 23:31:22 UTC
86dbb2e Merge pull request #42 from CNugteren/development Update to version 2.4.0 29 June 2016, 17:50:22 UTC
a001605 Minor fix to the AppVeyor CI build 29 June 2016, 16:21:25 UTC
45b2c52 Updated to version 2.4.0 29 June 2016, 16:10:46 UTC
0526f9d Made it possible to run some of the GEMM kernels using CUDA (those without shared memory) 29 June 2016, 16:08:10 UTC
6177c14 Updated to version 6.0 of the CLCudaAPI header 29 June 2016, 15:50:12 UTC
609ea4c Removed building of tests for AppVeyor CI 29 June 2016, 15:49:52 UTC
fca2ad1 Added Appveyor CI and added OS X compilation for Travis 29 June 2016, 15:03:42 UTC
48719a2 Fixed the RPATH settings for OSX 16 June 2016, 18:20:44 UTC
b516ef7 Added a VERBOSE option to CMake to get additional diagnostic messages 16 June 2016, 18:18:49 UTC
e95c158 Unit-tests are now based on string-kernels instead of external-file-kernels to make it possible to run the unit test executables anywhere 31 May 2016, 18:37:17 UTC
f1b0900 Merge pull request #39 from CNugteren/development Update to version 2.3.1 25 May 2016, 11:03:26 UTC
ebb3085 Updated to version 2.3.1 (bug-fix release) 25 May 2016, 10:14:35 UTC
53a05ba Fixed computing the validation error for half-precision fp16 data-types 24 May 2016, 09:58:26 UTC
e9f43b5 Fixed a bug where an output buffer could not be used as input at the same time 24 May 2016, 09:56:15 UTC
b887e1e Merge pull request #38 from CNugteren/development Update to version 2.3.0 22 May 2016, 15:05:41 UTC
ae12ebe Updated to version 2.3.0 22 May 2016, 15:01:18 UTC
921271c Fixed CMake to compare strings properly; made MSVC link the runtime libraries statically 22 May 2016, 15:00:41 UTC
f923a17 Fixed a bug where failed results would still show up in the JSON files 22 May 2016, 14:41:10 UTC
86d701c Fixed a bug where failed results would still show up in the final results 16 May 2016, 10:14:18 UTC
ccf5ce2 Added support for short integers and cl_half fp16 as kernel arguments 14 May 2016, 15:59:17 UTC
cba89a4 Merge pull request #37 from CNugteren/development Update to version 2.2.0 27 April 2016, 09:03:34 UTC
2da8be1 Updated to version 2.2.0 27 April 2016, 08:56:54 UTC
acc110a Made the new samples work for CUDA as well 27 April 2016, 08:55:13 UTC
5f645e8 Fixed a typo in the API documentation 27 April 2016, 08:42:47 UTC
8b76ad1 Added API documentation to the repository 27 April 2016, 08:39:18 UTC
122cbb9 Minor fixes related to the newly added samples 27 April 2016, 07:59:27 UTC
9801a1e Added two much simpler examples to improve documentation 25 April 2016, 00:13:47 UTC
eac490c Updated the documentation 24 April 2016, 02:59:02 UTC
54df67a Updated headers to version 5.0 of the CLCudaAPI 24 April 2016, 02:58:00 UTC
8752c44 Updated Travis to reflect the latest Travis and Khronos changes 24 April 2016, 02:45:10 UTC
b306cf1 Merge pull request #36 from williamjshipman/development Only use OpenCL 2.x functions on OpenCL 2.x devices 03 April 2016, 22:41:46 UTC
bf1821b - Add VersionNumber function for querying device OpenCL version number as an integer (e.g. 120 for OpenCL 1.2). - Clean up OpenCL 2.0 check in Queue constructor. 03 April 2016, 01:20:33 UTC
33ba3ef Merge pull request #2 from CNugteren/development Development 02 April 2016, 19:37:23 UTC
da97040 Prepared the changelog for the next release 31 March 2016, 04:11:58 UTC
ad94a3d Merge branch 'development' 31 March 2016, 04:09:37 UTC
5802148 Updated to version 2.1.0 31 March 2016, 04:08:35 UTC
0110efc Merge branch 'development' of https://github.com/williamjshipman/CLTune into development 26 March 2016, 13:53:07 UTC
bccd8ac Add runtime check for OpenCL 2 before using OpenCL 2 function. 26 March 2016, 13:37:30 UTC
4698d4a Add runtime check for OpenCL 2 before using OpenCL 2 function. 26 March 2016, 13:21:37 UTC
0dc2a99 Updated the README 21 March 2016, 21:27:54 UTC
1ad3bb2 Merge branch 'development' of github.com:CNugteren/CLTune into development specially if it merges an updated upstream into a topic branch. 21 March 2016, 21:15:24 UTC
0b90c0c Fixes for minor warnings under Visual Studio 21 March 2016, 19:57:35 UTC
1d3c159 Added dllexport to be able to build a DLL under Windows 21 March 2016, 19:56:35 UTC
b170354 Merge pull request #35 from williamjshipman/development Add command line parameter for platform index to conv and gemm samples in line with description in README. 31 January 2016, 17:38:09 UTC
59faefa Updated the README to show that the platform ID is one of the command line parameters and updated the samples so that the order of the parameters matches all parts of the README. 30 January 2016, 23:07:16 UTC
b5a3a8b Samples now support a platform parameter in their command lines, in addition to the device number. 30 January 2016, 22:48:10 UTC
dcddd80 Updated FindOpenCL for Intel Linux OpenCL paths 23 January 2016, 15:08:14 UTC
d643731 Prepared the changelog for the next release 22 November 2015, 11:19:55 UTC
9e401f4 Merge pull request #33 from CNugteren/development Added machine learning, new CLCudaAPI, CUDA, Catch, and MSVC support 22 November 2015, 11:18:09 UTC
8bc6684 Updated to version 2.0.0 22 November 2015, 11:16:46 UTC
b22dce2 Updated the readme 22 November 2015, 11:15:35 UTC
a21d4a5 Merge pull request #32 from CNugteren/catch_tests Replaced GTest with Catch unit testing 21 November 2015, 13:33:50 UTC
400752b Updated changelog and readme 21 November 2015, 13:29:27 UTC
8757c9e Updated the 'KernelInfo' class to use Catch 21 November 2015, 13:27:45 UTC
back to top