https://github.com/Microsoft/CNTK
Revision 5905b2f9fd84fabd36cb680d725eda54302e9744 authored by Aghagolzadeh on 25 April 2018, 18:43:13 UTC, committed by GitHub on 25 April 2018, 18:43:13 UTC
updated reset function for BMUF to support half precision
1 parent adf4826
Raw File
Tip revision: 5905b2f9fd84fabd36cb680d725eda54302e9744 authored by Aghagolzadeh on 25 April 2018, 18:43:13 UTC
Update BlockMomentumDistributedLearner.h
Tip revision: 5905b2f
current_iteration.md
# CNTK Current Iteration

## Efficient group convolution
The implementation of group convolution in CNTK has been updated. The updated implementation moves away from creating a sub-graph for group convolution (using slicing and splicing), and instead uses cuDNN7 and MKL2017 APIs directly. This improves the experience both in terms of performance and model size. 

As an example, for a single group convolution op with the following attributes:

- Input tensor (C, H, W) = (32, 128, 128)
- Number of output channels = 32 (channel multiplier is 1)
- Groups = 32 (depth wise convolution)
- Kernel size = (5, 5)

The comparison numbers for this single node are as follows:

| First Header  | GPU exec. time (in millisec., 1000 run avg.) | CPU exec. time (in millisec., 1000 run avg.) | Model Size (in KB, CNTK format)
| ------------- | ------------- | ------------- | ------------- |
| Old implementation  | 9.349  | 41.921  | 38  |
| New implementation  | 6.581  | 9.963  | 5  |
| Speedup/savings	Approx.  | 30%	Approx.  | 65-75%	Approx.  | 87% |

## Operators


## Bug fixes


## ONNX
### Updates
- Updated CNTK's ONNX BatchNormalization op export/import to latest spec.

### Bug or minor fixes:


## Misc

back to top