2038542 | Malte Brunn | 05 November 2020, 11:48:01 UTC | removed unsupported compute capabilities and added new ones | 05 November 2020, 11:48:01 UTC |
133a585 | brunnme | 17 August 2018, 13:01:44 UTC | error in GPU_float fixed | 17 August 2018, 13:01:44 UTC |
5cb08a4 | Amir Gholami | 04 May 2018, 23:08:13 UTC | Merge pull request #17 from pkestene/mpi_deprecated avoid datatype MPI::BOOL (deprecated) | 04 May 2018, 23:08:13 UTC |
8552f6e | Pierre Kestener | 04 May 2018, 10:19:04 UTC | replace MPI::BOOL (MPI c++ bindings deprecated) by a custom MPI_Datatype (from dtypes.h) | 04 May 2018, 10:19:04 UTC |
0d734b8 | Amir Gholami | 23 February 2018, 21:56:48 UTC | Merge pull request #14 from alalazo/fixes/odr_violation_gpu Fix odr violation on GPUs + don't override user nvcc flags | 23 February 2018, 21:56:48 UTC |
409e830 | Massimiliano Culpo | 21 February 2018, 15:28:19 UTC | Fixed build failures due to unsupported architectures When working on a machine that doesn't support the architectures that are appended unconditionnaly, the current make file incurs in a build failure. This should be fixed by making the defaults kick-in only if the user didn't specify anything else on the command line. | 21 February 2018, 15:28:19 UTC |
416518e | Massimiliano Culpo | 21 February 2018, 15:25:50 UTC | Fixed odr violation on explicit template instantiation A few temaplates are instantiated in a header file, that is included multiple times in different compilation units. This causes a build failure due to odr violation. Moving the explicit instantiation to the single cpp file where they are needed solves the issue. | 21 February 2018, 15:25:50 UTC |
632d249 | Amir Gholami | 15 February 2018, 19:38:41 UTC | Merge pull request #13 from alalazo/fixes/linking_shared_libraries Fixed linking of step* executables when producing shared libraries | 15 February 2018, 19:38:41 UTC |
4134eab | Massimiliano Culpo | 11 February 2018, 09:43:05 UTC | Fixed linking of step* executables when producing shared libraries Before this commit the library was not able to link the 'step*' executables when giving '-DBUILD_SHARED=true' (undefined references to a few symbols). | 11 February 2018, 09:43:05 UTC |
2199051 | Amir Gholami | 20 May 2017, 23:16:49 UTC | removed unnecessary operation in transpose | 20 May 2017, 23:16:49 UTC |
babb821 | Amir Gholami | 20 May 2017, 20:34:15 UTC | added support for outplace transpose | 20 May 2017, 20:34:15 UTC |
4475f3d | Amir Gholami | 20 May 2017, 02:21:23 UTC | added restrict keyword to transpose functions | 20 May 2017, 02:21:41 UTC |
113b3db | Amir Gholami | 10 April 2017, 15:20:01 UTC | removed debug info | 10 April 2017, 15:20:01 UTC |
22cf750 | Amir Gholami | 10 April 2017, 03:10:57 UTC | merged | 10 April 2017, 03:10:57 UTC |
f00f3f9 | Amir Gholami | 09 April 2017, 23:49:20 UTC | changed the layout for fast spectral operators so the FFT is performed on contiguous direction | 09 April 2017, 23:49:20 UTC |
c4b6ccf | Amir Gholami | 05 April 2017, 20:27:07 UTC | added self exec timer to gradient and divergence operators | 05 April 2017, 20:27:07 UTC |
2140179 | Amir Gholami | 03 April 2017, 05:29:13 UTC | removed unused variables | 03 April 2017, 05:29:13 UTC |
1eb9f32 | Amir Gholami | 03 April 2017, 05:16:47 UTC | added verbose2 definition to transpose | 03 April 2017, 05:16:47 UTC |
f6e0c4a | Amir Gholami | 02 April 2017, 22:54:10 UTC | guard against int overflow | 02 April 2017, 22:54:10 UTC |
97d17e6 | Amir Gholami | 29 March 2017, 00:12:44 UTC | removed unnecessary barriers | 29 March 2017, 00:12:44 UTC |
f70f053 | Amir Gholami | 28 March 2017, 22:15:35 UTC | removed barriers from operators.txx | 28 March 2017, 22:15:35 UTC |
d80b5d2 | Amir Gholami | 24 March 2017, 21:34:01 UTC | moved operator allocations to the memory manager | 24 March 2017, 21:34:01 UTC |
2a40c78 | Amir Gholami | 23 March 2017, 16:12:20 UTC | moved timings[5] to timings[6] for operators. t[5] is now empty | 23 March 2017, 16:12:20 UTC |
b005a97 | Amir Gholami | 23 March 2017, 04:06:47 UTC | added timers for operators, note that now you should pass a field of size 6 for the timings | 23 March 2017, 04:06:47 UTC |
f2eb48f | Amir Gholami | 23 March 2017, 02:42:36 UTC | added memcpy time to total transpose time for spectral operators | 23 March 2017, 02:42:36 UTC |
028f084 | Amir Gholami | 22 March 2017, 20:19:58 UTC | fix for a corner case in spectral operators | 22 March 2017, 20:19:58 UTC |
154b031 | Amir Gholami | 22 March 2017, 01:32:28 UTC | temporary switch to slow operators | 22 March 2017, 01:32:28 UTC |
7c0cb84 | Amir Gholami | 17 March 2017, 17:22:51 UTC | added plan templates for operators | 17 March 2017, 17:22:51 UTC |
02fefd2 | Amir Gholami | 16 March 2017, 21:11:49 UTC | added templated version of single and double precision | 16 March 2017, 21:11:49 UTC |
11c1d1b | Amir Gholami | 11 March 2017, 23:22:21 UTC | pushed the previous fix for single precision and gpu | 11 March 2017, 23:22:21 UTC |
0b0c207 | Amir Gholami | 11 March 2017, 01:41:10 UTC | fix for operators using irregular sizes | 11 March 2017, 01:41:10 UTC |
aa9870c | Amir Gholami | 28 February 2017, 11:57:02 UTC | fixed compile issue with gcc | 28 February 2017, 11:57:02 UTC |
9c37a08 | Amir Gholami | 27 February 2017, 23:36:42 UTC | debugged fast spectral solvers | 27 February 2017, 23:36:42 UTC |
8094869 | Amir Gholami | 21 February 2017, 17:28:07 UTC | fixed gcc compilation issue | 21 February 2017, 17:28:07 UTC |
dd02c1b | Amir Gholami | 21 February 2017, 01:40:01 UTC | code cleanup | 21 February 2017, 01:40:01 UTC |
fe62538 | Amir Gholami | 20 February 2017, 22:01:16 UTC | added fast versions of spectral operators | 20 February 2017, 22:01:16 UTC |
4cd49f4 | Amir Gholami | 15 February 2017, 16:47:19 UTC | fixed build issue for GPU | 15 February 2017, 16:47:19 UTC |
bcac4ee | Amir Gholami | 15 February 2017, 16:23:43 UTC | cleaned the code + filter nyquist in inverse laplace operator | 15 February 2017, 16:23:43 UTC |
ae56b40 | Amir Gholami | 12 January 2017, 22:38:51 UTC | corrected order of fftw library linking | 12 January 2017, 22:38:51 UTC |
5d73849 | Amir Gholami | 25 November 2016, 17:33:08 UTC | added header to soperators.txx | 25 November 2016, 17:33:08 UTC |
1c1bcf5 | Amir Gholami | 08 October 2016, 21:42:45 UTC | code cleanup | 08 October 2016, 21:42:45 UTC |
93cf04e | Amir Gholami | 07 October 2016, 19:17:04 UTC | minor change to step1 CMakeList | 07 October 2016, 19:17:04 UTC |
7d60a0c | Amir Gholami | 18 September 2016, 17:30:12 UTC | cleanup | 18 September 2016, 17:30:12 UTC |
e72664f | Amir Gholami | 30 June 2016, 03:10:29 UTC | changed create_comm message | 30 June 2016, 03:10:29 UTC |
eb337d4 | Amir Gholami | 26 June 2016, 01:51:13 UTC | fixed compiler warning for malloc | 26 June 2016, 01:51:13 UTC |
5c6064e | Amir Gholami | 26 June 2016, 01:38:00 UTC | fixed compiler warnings | 26 June 2016, 01:38:00 UTC |
4430898 | Amir Gholami | 26 June 2016, 00:39:57 UTC | Cmake find pnetcdf quietly, it is an optional package | 26 June 2016, 00:39:57 UTC |
10df740 | Amir Gholami | 14 June 2016, 04:27:16 UTC | outof order send for inverse | 14 June 2016, 04:27:16 UTC |
004249a | Amir Gholami | 14 June 2016, 04:12:35 UTC | mpi_waitall for v and v_h, also removed omp_parallel_for | 14 June 2016, 04:12:35 UTC |
9cfc60e | Amir Gholami | 14 June 2016, 03:52:50 UTC | MPI_STATUSES_NULL > MPI_STATUSES_IGNORE | 14 June 2016, 03:52:50 UTC |
00bd8d8 | Amir Gholami | 14 June 2016, 03:39:05 UTC | Merge pull request #8 from jeffhammond/fix-ulong fix ulong -> unsigned long | 14 June 2016, 03:39:05 UTC |
c45f68d | Amir Gholami | 14 June 2016, 03:38:29 UTC | Merge pull request #7 from jeffhammond/use-waitall replace loop over wait with waitall | 14 June 2016, 03:38:29 UTC |
ec90098 | Amir Gholami | 14 June 2016, 03:34:34 UTC | Merge pull request #6 from jeffhammond/remove-nilpotent-expressions Remove nilpotent expressions | 14 June 2016, 03:34:34 UTC |
0c85ca9 | Amir Gholami | 14 June 2016, 03:32:26 UTC | Merge pull request #5 from jeffhammond/use-calloc Use calloc | 14 June 2016, 03:32:26 UTC |
f0687ff | Jeff Hammond | 10 May 2016, 19:28:17 UTC | fix ulong -> unsigned long | 10 May 2016, 19:28:17 UTC |
df99207 | Jeff Hammond | 10 May 2016, 13:35:56 UTC | replace loop over wait with waitall 1) merge send and recv request vectors 2) set odd (even) values of request vector to send (recv) requests 3) call MPI_Waitall once instead of looping over two calls to MPI_Wait This should have a nontrivial impact when nproc is large and messages are completed by the network out-of-order w.r.t. the wait loop. (comment #2 may have send and recv backwards) | 10 May 2016, 13:35:56 UTC |
cb75449 | Jeff Hammond | 10 May 2016, 13:12:40 UTC | remove loop with trip count of 1 | 10 May 2016, 13:12:40 UTC |
49460fa | Jeff Hammond | 10 May 2016, 13:11:47 UTC | remove 0*nprocs | 10 May 2016, 13:11:47 UTC |
ae075be | Jeff Hammond | 10 May 2016, 13:11:12 UTC | replace 0*MPI_Wtime with 0 #2 | 10 May 2016, 13:11:12 UTC |
179be93 | Jeff Hammond | 10 May 2016, 13:09:37 UTC | replace 0*MPI_Wtime with 0 | 10 May 2016, 13:09:37 UTC |
b196a29 | Jeff Hammond | 10 May 2016, 09:38:34 UTC | eliminate unnecessary repeat log2() | 10 May 2016, 09:38:34 UTC |
0256c4a | Jeff Hammond | 10 May 2016, 09:36:03 UTC | use calloc=malloc+memset(0) #2 | 10 May 2016, 09:36:03 UTC |
b488850 | Jeff Hammond | 10 May 2016, 09:35:34 UTC | BUGFIX: ptrdiff_t often wider than int, so this only zeros the lower half of the array | 10 May 2016, 09:35:34 UTC |
c032f80 | Jeff Hammond | 10 May 2016, 09:34:44 UTC | use calloc=malloc+memset(0) | 10 May 2016, 09:34:44 UTC |
abb66aa | Amir Gholami | 04 May 2016, 01:59:35 UTC | made pnetcdf optional | 04 May 2016, 01:59:35 UTC |
77c3124 | Amir Gholami | 13 April 2016, 20:17:44 UTC | minor fix in debug mode print | 13 April 2016, 20:17:44 UTC |
4f320ad | Amir Gholami | 22 March 2016, 17:16:22 UTC | add c_comm to write_pnetcdf, and isize,istart to accfft_plan | 22 March 2016, 17:16:22 UTC |
d78abc1 | Amir Gholami | 07 March 2016, 23:04:11 UTC | bug fix for inverse laplace and biharmonic operators | 07 March 2016, 23:04:11 UTC |
18a869b | Amir Gholami | 07 March 2016, 19:03:42 UTC | gpu cleanup part1 | 07 March 2016, 19:03:42 UTC |
4b5d09c | Amir Gholami | 02 March 2016, 03:12:25 UTC | Merge branch 'master' of github.com:amirgholami/accfft | 02 March 2016, 03:12:25 UTC |
4ab0ba5 | Amir Gholami | 02 March 2016, 03:10:41 UTC | added double precision inverse laplace and inverse biharmonic operators. | 02 March 2016, 03:11:33 UTC |
5325f67 | Amir Gholami | 02 March 2016, 03:10:41 UTC | . | 02 March 2016, 03:10:41 UTC |
ad42521 | Amir Gholami | 01 March 2016, 18:44:34 UTC | no need to use issend | 01 March 2016, 18:44:34 UTC |
12b6ca1 | Amir Gholami | 01 March 2016, 18:43:39 UTC | bug fix | 01 March 2016, 18:43:39 UTC |
7143762 | Amir Gholami | 23 February 2016, 21:21:53 UTC | added biharmonic spectral operator | 23 February 2016, 21:21:53 UTC |
7d2b50f | Amir Gholami | 22 February 2016, 01:35:16 UTC | cleanup | 22 February 2016, 01:35:16 UTC |
316cf7d | Amir Gholami | 21 February 2016, 22:15:37 UTC | code cleanup 2 | 21 February 2016, 22:15:37 UTC |
ddb163c | Amir Gholami | 21 February 2016, 22:09:43 UTC | code cleanup | 21 February 2016, 22:09:43 UTC |
1f41020 | Amir Gholami | 17 February 2016, 03:29:10 UTC | removed c++11 features for compatibility with older compilers | 17 February 2016, 03:29:10 UTC |
094e279 | Amir Gholami | 17 February 2016, 03:12:17 UTC | save details during setup phase | 17 February 2016, 03:12:17 UTC |
228be48 | Amir Gholami | 16 February 2016, 17:01:48 UTC | compatibility with older compilers due to bitlist list initialization | 16 February 2016, 17:01:48 UTC |
a76e35e | Amir Gholami | 16 February 2016, 04:01:19 UTC | more inclusive setup phase, fixed a compile error on cray machines | 16 February 2016, 04:01:19 UTC |
d33318b | Amir Gholami | 30 January 2016, 21:49:02 UTC | clean up setup function, better communication selector | 30 January 2016, 21:49:02 UTC |
cde1002 | Amir Gholami | 02 January 2016, 00:41:58 UTC | step1 cleanup | 02 January 2016, 00:41:58 UTC |
1dd1b4d | Amir Gholami | 02 January 2016, 00:39:04 UTC | compatibility with new cmake versions | 02 January 2016, 00:39:04 UTC |
c454aca | Amir Gholami | 14 December 2015, 02:45:08 UTC | corrected timings for step5 and step6 | 14 December 2015, 02:45:08 UTC |
35e816f | Amir Gholami | 14 December 2015, 01:32:48 UTC | changed operator timings to the default style (see doxygen) | 14 December 2015, 01:32:48 UTC |
5ef46db | Amir Gholami | 06 December 2015, 22:51:26 UTC | clean up, added pnetcdf_io for floats | 06 December 2015, 22:51:26 UTC |
6aa2172 | Amir Gholami | 03 December 2015, 20:52:36 UTC | working on step6 (unfinished) | 03 December 2015, 20:52:36 UTC |
107ec29 | Amir Gholami | 03 December 2015, 20:52:00 UTC | timing for operators | 03 December 2015, 20:52:00 UTC |
b95b5d9 | Amir Gholami | 02 December 2015, 03:08:13 UTC | added step5: spectral operators for CPU and GPU | 02 December 2015, 03:08:13 UTC |
2f8ba07 | Amir Gholami | 02 December 2015, 03:07:24 UTC | doxygen for spectral operators + code cleanup | 02 December 2015, 03:07:24 UTC |
40860ec | Amir Gholami | 01 December 2015, 22:32:42 UTC | gpu spectral operators completed | 01 December 2015, 22:32:42 UTC |
39d28f8 | Amir Gholami | 30 November 2015, 23:18:05 UTC | initial commit for gpu spectral operators (unfinished) | 30 November 2015, 23:18:05 UTC |
d0c1bf4 | Amir Gholami | 29 November 2015, 23:15:27 UTC | cpu spectral operators finished | 29 November 2015, 23:15:27 UTC |
11102d6 | Amir Gholami | 26 November 2015, 23:43:51 UTC | adding spectral operators (uncompleted) | 26 November 2015, 23:43:51 UTC |
6bfd26f | Amir Gholami | 25 November 2015, 22:00:10 UTC | single precision support added | 25 November 2015, 22:00:10 UTC |
65eb680 | Amir Gholami | 25 November 2015, 03:15:30 UTC | added single precision versions of steps 1,2,3 | 25 November 2015, 03:15:30 UTC |
93229b6 | Amir Gholami | 24 November 2015, 23:00:00 UTC | single precision support, part iv added float fft srcs | 24 November 2015, 23:00:00 UTC |
0551a19 | Amir Gholami | 23 November 2015, 21:21:59 UTC | single precision support part iii, modified accfft.cpp | 23 November 2015, 21:21:59 UTC |