Deep learning visualizer

Normalization Visualizer

See how normalization stabilizes activations and improves neural network training.

MethodBatchNormBatch size4Features6Epsilon1e-5
1. Method
3. Activation matrix (4 x 6)
10. Mode

Batch Normalization: Normalize Each Feature Across the Batch

Original Activations

216354304265525473732546

Standardized Values

-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34

Scaled & Shifted Output

-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34
Selected cell calculation (row 1, feature 3):(6 - 4.25) / sqrt(2.19 + 1e-5) = 1.18
Feature 3 of 6

Before: Original Activation Distribution

mu = 3.92
var = 3.33
0Activation value7

After: Standardized Distribution

mu = 0
var = 1
-1.5Standardized value1.4

What gets normalized?

Batch Normalization

216354304265525473732546

Each feature across examplesNormalization is computed down each column.

Common in CNNs

Layer Normalization

216354304265525473732546

All features within each exampleNormalization is computed across each row.

Common in Transformers

How normalization works

  1. Compute mean and variance along the normalization axis.
  2. Standardize: subtract the mean and divide by standard deviation.
  3. Scale by learnable gamma and shift by beta.
  4. During inference, BatchNorm uses running estimates.

Implementation

  1. import torch.nn as nn
  2. bn = nn.BatchNorm1d(num_features=6)
  3. ln = nn.LayerNorm(normalized_shape=6)
  4. bn.train() # uses batch statistics
  5. bn.eval() # uses running statistics