-
Notifications
You must be signed in to change notification settings - Fork 1
BatchNormalization
Note: CNTK's implementation of batch normalization relies on cuDNN and is only fully implemented for GPU. Only inference can be done on the CPU.
BatchNormalization
implements a technique described in paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Sergey Ioffe, Christian Szegedy).
In short, it normalizes layer outputs for every minibatch for each output (feature) independently and applies affine transformation to preserve representation of the layer. That is, for layer input
:
m = mean(input)
var = variance(input)
input_norm = (input - mean)/sqrt(var)
output = gamma * input_norm + beta
where gamma
and beta
are trainable parameters (represented as Parameter).
BatchNormalization
has the following syntax:
BatchNormalization(input, scale, bias, runMean, runInvStdDev, spatial,
normalizationTimeConstant = 0, blendTimeConstant = 0,
epsilon = 0.00001,
useCntkEngine = true, imageLayout='cudnn', tag='')
Where:
-
input
is the input of the batch normalization node -
scale
is a Parameter that stores scale vector (gamma
term in the equation above). -
bias
is a Parameter that stores bias vector (beta
term).scale
andbias
must have the same dimensions which must be equal to theinput
dimensions in case ofspatial = false
or number of output convolution feature maps in case ofspatial = true
. -
runMean
is the running mean which is used during evaluation phase and might be used during training as well. It is represented as a Parameter with the same dimensions asscale
andbias
. -
runInvStdDev
is the running inverse square root of variance (soInvStdDev = 1 / sqrt(var + epsilon)
). It is represented as a Parameter with the same dimensions asscale
andbias
. -
spatial
is a flag that specifies whether to compute mean/var for each feature in a minibatch independently or, in case of convolutional layers, per feature map. -
normalizationTimeConstant
is the time constant which is used to compute running average of mean and variance. Value0
(default) means there will be no exponential smoothing and running mean/variance will always have values computed for the last seen minibatch. Value1#INF
(infinity) means running values are "frozen" (i.e. will not be updated). Depending on the dataset and network configuration, different values can be used. For example, for MNIST dataset you can set it to 1024 and for speech datasets to number of frames corresponding to 24 hour period. The constant can also be set globally (in .cntk config file) usingbatchNormalizationTimeConstant
parameter, for example:batchNormalizationTimeConstant=0:1024
-
blendTimeConstant
is the time constant which allows to specify how much of running mean/var should be "blended" into mean/var of the current minibatch. Value0
(default) means no blending will happen and only the current minibatch statistics will be used. Value1#INF
(infinity) means only running mean/var will be used (this is used, for example, in evaluation phase). For example, you can start with 0, then set it to half of the size of minibatch and then set it to infinity after several epochs. This can be done using .cntk (config) filebatchNormalizationBlendTimeConstant
option:batchNormalizationBlendTimeConstant=0:32*10:1#INF
-
epsilon
is a conditioner constant used in computingInvStdDev
-
useCntkEngine
is a boolean flag that specifies which batch normalization implementation to use: CNTK or cuDNN-based. -
imageLayout
is the image layout. Onlycudnn
is supported.
For more information about time constants and exponential smoothing: https://en.wikipedia.org/wiki/Exponential_smoothing#Time_Constant
Note that for evaluation stage CNTK will set time constants automatically, users do not have to change anything to switch between the stages.
Getting started
- Home
- Setup CNTK on your machine
- Tutorial
- Tutorial II
- CNTK usage overview
- Examples
- Presentations
- Multiple GPUs and machines
Configuring CNTK
- Config file overview
- Simple Network Builder
-
BrainScript Network Builder
- also see Describing Networks below
- SGD block
- Reader block
- Train, Test, Eval
- Top-level configurations
Describing Networks with BrainScript
Data readers
- Text Format Reader
- HTKMLF Reader
- LM sequence reader
- LU sequence reader
- Image reader
- Deserializers and Transforms
Evaluating CNTK Models
- Overview
- C++ Evaluation Interface
- C# Evaluation Interface
- Evaluating Hidden Layers
- C# Image Transforms for Evaluation
Advanced topics
Licenses