BackPropagation And Architecture Basics
Part 1) Backpropagation
GitHub Link to Neural Network Training excel sheet
Major Steps in NN Training
Suppose we have the below network:
1) Initialization of Neural Network
Randomly initializing weights [w1,….,w8] and Learning Rate lr.
2) For each iteration, below steps are performed:
A) Forward Propagation
For the above network, output will be calculated using the below formulae:
h1 = w1*i1 + w2*i2
h2 = w3*i1 + w4*i2
a_h1 = σ(h1) = 1/(1+exp(-h1))
a_h2 = σ(h2) = 1/(1+exp(-h2))
o1 = w5*a_h1 + w6*a_h2
o2 = w7*a_h1 + w8*a_h2
a_o1 = σ(o1) = 1/(1+exp(-o1))
a_o2 = σ(o2) = 1/(1+exp(-o2))
B) Backpropagation
I) Using the generated output and target, error is calculated.
For above network, L2 error is calculated as shown below:
E1 = ½*(t1-a_o1)²
E2 = ½*(t2-a_o2)²
E_total = E1 + E2
II) Each of the weight in the network is updated as follows:
i) Error gradient with respect to the specific weight(∂Error/∂wi) is calculated. Negative error indicates the direction in which the error can be minimized by updatig the specific weight.
Example:- For w5
∂E_total/∂w5 = ∂(E1+E2)/∂w5 = ∂E1/∂w5 = ∂E1/∂a_o1 * ∂a_o1/∂o1 * ∂o1/∂w5
∂E1/∂a_o1 = -1*(t1-a_o1) = a_o1-t1
∂a_o1/∂o1 = ∂(σ(o1))/∂o1 = σ(o1)*(1-σ(o1)) = a_o1*(1-a_o1)
∂o1/∂w5 = a_h1
Thus, it becomes
∂E_total/∂w5 = (a_o1-t1) * a_o1*(1-a_o1) * a_h1
ii) Weight is updated using:
wij+1 = wij - ɳ * ∂Error/∂wij
where i is the weight index,
j is the iteration number
For above network, below are the error gradients with respect to weights calculated using chain formula.
Error Graphs for various learning rates
Part 2) Architecture Basics
Code Link |
---|
GitHub |
Google Colab |
Best/Final Test Accuracy: 99.41 %
Model Architecture
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 26, 26] 144
ReLU-2 [-1, 16, 26, 26] 0
BatchNorm2d-3 [-1, 16, 26, 26] 32
Dropout-4 [-1, 16, 26, 26] 0
Conv2d-5 [-1, 24, 24, 24] 3,456
ReLU-6 [-1, 24, 24, 24] 0
BatchNorm2d-7 [-1, 24, 24, 24] 48
Dropout-8 [-1, 24, 24, 24] 0
Conv2d-9 [-1, 10, 24, 24] 240
MaxPool2d-10 [-1, 10, 12, 12] 0
Conv2d-11 [-1, 14, 10, 10] 1,260
ReLU-12 [-1, 14, 10, 10] 0
BatchNorm2d-13 [-1, 14, 10, 10] 28
Dropout-14 [-1, 14, 10, 10] 0
Conv2d-15 [-1, 16, 8, 8] 2,016
ReLU-16 [-1, 16, 8, 8] 0
BatchNorm2d-17 [-1, 16, 8, 8] 32
Dropout-18 [-1, 16, 8, 8] 0
Conv2d-19 [-1, 16, 6, 6] 2,304
ReLU-20 [-1, 16, 6, 6] 0
BatchNorm2d-21 [-1, 16, 6, 6] 32
Dropout-22 [-1, 16, 6, 6] 0
AvgPool2d-23 [-1, 16, 1, 1] 0
Conv2d-24 [-1, 16, 1, 1] 256
Conv2d-25 [-1, 32, 1, 1] 512
Linear-26 [-1, 10] 330
================================================================
Total params: 10,690
Trainable params: 10,690
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.90
Params size (MB): 0.04
Estimated Total Size (MB): 0.94
----------------------------------------------------------------
Training and Testing
Logs
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 1
Train: Loss=0.3845 Batch_id=937 Accuracy=80.10: 100%|██████████| 938/938 [00:39<00:00, 23.63it/s]
Test set: Average loss: 0.0605, Accuracy: 9807/10000 (98.07%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 2
Train: Loss=0.1039 Batch_id=937 Accuracy=95.97: 100%|██████████| 938/938 [00:31<00:00, 30.12it/s]
Test set: Average loss: 0.0582, Accuracy: 9806/10000 (98.06%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 3
Train: Loss=0.3055 Batch_id=937 Accuracy=96.72: 100%|██████████| 938/938 [00:31<00:00, 29.44it/s]
Test set: Average loss: 0.0334, Accuracy: 9907/10000 (99.07%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 4
Train: Loss=0.0642 Batch_id=937 Accuracy=97.17: 100%|██████████| 938/938 [00:33<00:00, 28.37it/s]
Test set: Average loss: 0.0316, Accuracy: 9901/10000 (99.01%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 5
Train: Loss=0.0445 Batch_id=937 Accuracy=97.35: 100%|██████████| 938/938 [00:31<00:00, 29.76it/s]
Test set: Average loss: 0.0306, Accuracy: 9905/10000 (99.05%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 6
Train: Loss=0.2260 Batch_id=937 Accuracy=97.62: 100%|██████████| 938/938 [00:31<00:00, 30.21it/s]
Test set: Average loss: 0.0259, Accuracy: 9916/10000 (99.16%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 7
Train: Loss=0.1017 Batch_id=937 Accuracy=97.69: 100%|██████████| 938/938 [00:31<00:00, 29.83it/s]
Test set: Average loss: 0.0261, Accuracy: 9918/10000 (99.18%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 8
Train: Loss=0.0116 Batch_id=937 Accuracy=97.85: 100%|██████████| 938/938 [00:32<00:00, 29.27it/s]
Test set: Average loss: 0.0247, Accuracy: 9914/10000 (99.14%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 9
Train: Loss=0.0053 Batch_id=937 Accuracy=97.93: 100%|██████████| 938/938 [00:31<00:00, 29.97it/s]
Test set: Average loss: 0.0233, Accuracy: 9924/10000 (99.24%)
Adjusting learning rate of group 0 to 1.0000e-02.
Epoch 10
Train: Loss=0.1205 Batch_id=937 Accuracy=97.92: 100%|██████████| 938/938 [00:31<00:00, 30.02it/s]
Test set: Average loss: 0.0205, Accuracy: 9929/10000 (99.29%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 11
Train: Loss=0.1305 Batch_id=937 Accuracy=98.36: 100%|██████████| 938/938 [00:31<00:00, 29.66it/s]
Test set: Average loss: 0.0191, Accuracy: 9935/10000 (99.35%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 12
Train: Loss=0.1472 Batch_id=937 Accuracy=98.39: 100%|██████████| 938/938 [00:31<00:00, 29.53it/s]
Test set: Average loss: 0.0194, Accuracy: 9937/10000 (99.37%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 13
Train: Loss=0.1252 Batch_id=937 Accuracy=98.42: 100%|██████████| 938/938 [00:31<00:00, 30.06it/s]
Test set: Average loss: 0.0198, Accuracy: 9933/10000 (99.33%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 14
Train: Loss=0.0487 Batch_id=937 Accuracy=98.42: 100%|██████████| 938/938 [00:32<00:00, 29.13it/s]
Test set: Average loss: 0.0185, Accuracy: 9938/10000 (99.38%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 15
Train: Loss=0.0031 Batch_id=937 Accuracy=98.48: 100%|██████████| 938/938 [00:31<00:00, 29.94it/s]
Test set: Average loss: 0.0178, Accuracy: 9936/10000 (99.36%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 16
Train: Loss=0.0247 Batch_id=937 Accuracy=98.54: 100%|██████████| 938/938 [00:31<00:00, 29.47it/s]
Test set: Average loss: 0.0195, Accuracy: 9931/10000 (99.31%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 17
Train: Loss=0.0082 Batch_id=937 Accuracy=98.49: 100%|██████████| 938/938 [00:31<00:00, 29.69it/s]
Test set: Average loss: 0.0179, Accuracy: 9939/10000 (99.39%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 18
Train: Loss=0.0014 Batch_id=937 Accuracy=98.49: 100%|██████████| 938/938 [00:31<00:00, 30.16it/s]
Test set: Average loss: 0.0189, Accuracy: 9932/10000 (99.32%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 19
Train: Loss=0.0914 Batch_id=937 Accuracy=98.50: 100%|██████████| 938/938 [00:32<00:00, 29.14it/s]
Test set: Average loss: 0.0177, Accuracy: 9937/10000 (99.37%)
Adjusting learning rate of group 0 to 1.0000e-03.
Epoch 20
Train: Loss=0.0597 Batch_id=937 Accuracy=98.47: 100%|██████████| 938/938 [00:31<00:00, 29.40it/s]
Test set: Average loss: 0.0176, Accuracy: 9941/10000 (99.41%)
Adjusting learning rate of group 0 to 1.0000e-04.
Visualization