Deep learning visualizer
Normalization Visualizer
See how normalization stabilizes activations and improves neural network training.
MethodBatchNormBatch size4Features6Epsilon1e-5
1. Method
3. Activation matrix (4 x 6)
10. Mode
Batch Normalization: Normalize Each Feature Across the Batch
Original Activations
216354304265525473732546
Standardized Values
-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34
Scaled & Shifted Output
-1.17-0.451.18-0.45-0.45-0.45-0.65-1.34-0.17-1.340.450.450.390.450.510.451.34-1.341.431.34-1.521.34-1.341.34
Selected cell calculation (row 1, feature 3):(6 - 4.25) / sqrt(2.19 + 1e-5) = 1.18
Feature 3 of 6
Before: Original Activation Distribution
mu = 3.92var = 3.33
0Activation value7
After: Standardized Distribution
mu = 0var = 1
-1.5Standardized value1.4
What gets normalized?
Batch Normalization
216354304265525473732546
Each feature across examplesNormalization is computed down each column.
Common in CNNsLayer Normalization
216354304265525473732546
All features within each exampleNormalization is computed across each row.
Common in TransformersHow normalization works
- Compute mean and variance along the normalization axis.
- Standardize: subtract the mean and divide by standard deviation.
- Scale by learnable gamma and shift by beta.
- During inference, BatchNorm uses running estimates.
import torch.nn as nnbn = nn.BatchNorm1d(num_features=6)ln = nn.LayerNorm(normalized_shape=6)bn.train() # uses batch statisticsbn.eval() # uses running statistics