Deep learning visualizer

Normalization Visualizer

See how normalization stabilizes activations and improves neural network training.

MethodBatchNormBatch size4Features6Epsilon1e-5

1. Method

2. Dataset preset

3. Activation matrix (4 x 6)

4. Batch size 4

5. Features 6

6. Epsilon (epsilon)7. Gamma (gamma) 1.0

8. Beta (beta) 0.0

-22

9. Learnable gamma and beta

10. Mode

Batch Normalization: Normalize Each Feature Across the Batch

Original Activations

216354304265525473732546

Standardized Values

-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34

Scaled & Shifted Output

-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34

Selected cell calculation (row 1, feature 3):(6 - 4.25) / sqrt(2.19 + 1e-5) = 1.18

Feature 3 of 6

Before: Original Activation Distribution

mu = 3.92
var = 3.33

0Activation value7

After: Standardized Distribution

mu = 0
var = 1

-1.5Standardized value1.4

What gets normalized?

Batch Normalization

216354304265525473732546

Each feature across examplesNormalization is computed down each column.

Common in CNNs

Layer Normalization

216354304265525473732546

All features within each exampleNormalization is computed across each row.

Common in Transformers

How normalization works

Compute mean and variance along the normalization axis.
Standardize: subtract the mean and divide by standard deviation.
Scale by learnable gamma and shift by beta.
During inference, BatchNorm uses running estimates.

import torch.nn as nn
bn = nn.BatchNorm1d(num_features=6)
ln = nn.LayerNorm(normalized_shape=6)
bn.train() # uses batch statistics
bn.eval() # uses running statistics

Batch Normalization: Normalize Each Feature Across the Batch

Original Activations

Standardized Values

Scaled & Shifted Output

Before: Original Activation Distribution

After: Standardized Distribution

What gets normalized?

Batch Normalization

Layer Normalization

How normalization works

Implementation