Enable batch normalization beta #656

Ttl · 2018-05-24T16:33:08Z

Currently the training code sets batch normalization center parameter to False which disables the learnable beta bias parameters. This forces every batch norm output plane to have zero mean which limits the internal presentation of the network.

Some theoretical benefits of enabling the beta is that it allows it to control the fraction of output values that are clipped by relu and allows to learn internal presentation that works better with zero padding used in convolutions.

With zero mean about half of the output values are negative and are clipped. Learnable beta allows the optimizer to adjust the amount of nonlinearity that is applied by adding bias to the values.

Convolutional layers use zero padding for values outside the edges. Padding with any other constant value can be achieved by adding bias before and after the convolution. The current architecture can't take advantage of it and is stuck with zero padding because there are no biases.

The good news is that they can be enabled in completely backwards compatible way. Only the training code needs to be changed and the code is also already implemented in LZGo (leela-zero/leela-zero@5a2aca1).

Getting any benefit from enabling them probably requires retraining the network from scratch. It might be a good idea to enable them before the network size is raised again. In LZGo the beta parameters are consistently negative for almost every layer and they have relatively large magnitudes suggesting that the network does use them and the current zero values are not optimal.

Batch norm scale parameter is also probably not redundant due to the residual path but they are harder to enable in backwards compatible way.

The text was updated successfully, but these errors were encountered:

killerducky · 2018-05-29T00:11:23Z

I think this comment in transforms.cc and network_cudnn.cc means lc0 still supports this.

  // Biases are not calculated and are typically zero but some networks might
  // still have non-zero biases.
  // Move biases to batchnorm means to make the output match without having
  // to separately add the biases.

Tilps mentioned this issue May 29, 2018

Port the BN beta changes from leela-zero #690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable batch normalization beta #656

Enable batch normalization beta #656

Ttl commented May 24, 2018

killerducky commented May 29, 2018

Enable batch normalization beta #656

Enable batch normalization beta #656

Comments

Ttl commented May 24, 2018

killerducky commented May 29, 2018