https://github.com/Microsoft/CNTK
Revision 7d66c47f613cf2101b72652671e9bb502d3a03cd authored by KeDengMS on 07 September 2017, 06:18:49 UTC, committed by KeDengMS on 07 September 2017, 06:18:49 UTC
This change saves costly conversion from sparse to dense before gradient aggregation when embedding vocabulary size is huge.
It is currently enabled for GPU build when training on GPU with non-quantized data parallel SGD. For other distributed learners and CPU build, it is disabled by default.
It can be manually turned off in python by calling `cntk.cntk_py.use_sparse_gradient_aggregation_in_data_parallel_sgd(False)`
Note that for a rare case of running distributed training with CPU device on a GPU build, you need to manually turn it off to avoid unimplemented exception
1 parent 765707a
History
Tip revision: 7d66c47f613cf2101b72652671e9bb502d3a03cd authored by KeDengMS on 07 September 2017, 06:18:49 UTC
Sparse gradient aggregation on GPU
Tip revision: 7d66c47
File Mode Size
Dependencies
Documentation
Examples
Manual
PretrainedModels
Scripts
Source
Tests
Tools
Tutorials
bindings
.clang-format -rw-r--r-- 931 bytes
.gitattributes -rw-r--r-- 3.3 KB
.gitignore -rw-r--r-- 7.2 KB
.gitmodules -rw-r--r-- 211 bytes
CNTK.Common.props -rw-r--r-- 1.6 KB
CNTK.Cpp.props -rw-r--r-- 11.7 KB
CNTK.sln -rw-r--r-- 213.2 KB
CONTRIBUTING.md -rw-r--r-- 210 bytes
CppCntk.vssettings -rw-r--r-- 10.0 KB
LICENSE.md -rw-r--r-- 4.6 KB
Makefile -rw-r--r-- 57.6 KB
README.md -rw-r--r-- 5.0 KB
configure -rwxr-xr-x 34.5 KB

README.md

back to top