https://github.com/Microsoft/CNTK
Revision 7d66c47f613cf2101b72652671e9bb502d3a03cd authored by KeDengMS on 07 September 2017, 06:18:49 UTC, committed by KeDengMS on 07 September 2017, 06:18:49 UTC
This change saves costly conversion from sparse to dense before gradient aggregation when embedding vocabulary size is huge.
It is currently enabled for GPU build when training on GPU with non-quantized data parallel SGD. For other distributed learners and CPU build, it is disabled by default.
It can be manually turned off in python by calling `cntk.cntk_py.use_sparse_gradient_aggregation_in_data_parallel_sgd(False)`
Note that for a rare case of running distributed training with CPU device on a GPU build, you need to manually turn it off to avoid unimplemented exception
1 parent 765707a
Raw File
Tip revision: 7d66c47f613cf2101b72652671e9bb502d3a03cd authored by KeDengMS on 07 September 2017, 06:18:49 UTC
Sparse gradient aggregation on GPU
Tip revision: 7d66c47
CONTRIBUTING.md
You want to contribute to CNTK? We're really excited to work together!

Please, follow the steps from the documentation:

https://docs.microsoft.com/en-us/cognitive-toolkit/contributing-to-cntk

Your CNTK team.
back to top