MNIST-trained Model with 99.4% test accuracy in < 8K Parameters and <15 epochs
Trial Summary
S.No. | File Name | Highlight | Targets | Results | Analysis | File Link |
---|---|---|---|---|---|---|
1 | S7_File1 | Basic Skeleton Creation | Create a basic skeleton with less than 8K parameters which is able to reach 99% in less than 15 epochs. The basic skeleton was created based on the expand and squeeze architecture. | <ul><li>Best Train Accuracy - 99.59%</li><li> Best Test Accuracy - 99.27%</li><li>Total Parameters - 8000</li></ul> | Good starting model but high overfitting | Open |
2 | S7_File2 | Improving the Basic Model (Reducing Overfitting) | Improve the basic model by reducing overfitting. Added dropout of 0.05 to reduce overfitting. With the basic model, I was able to achieve 99.4% accuracy when trained for around 40 epochs. This means that the model has the capacity to reach 99.4%. So, after adding dropout, I added step LR starting at 0.1 and reducing by 0.1 at every 4 epochs. These 2 numbers were found after experimenting. | <ul><li>Best Train Accuracy - 98.54%</li><li> Best Test Accuracy - 99.38%</li><li>Total Parameters - 8000</li></ul> | Overfitting reduced and was consistently able to maintain 99.3% test accuracy. Increasing the learning rate to 0.1 helped to reach higher accuracy sooner and gradually decreasing the learning rate by 0.1 helped in achieving stable results. Giving more training sample can improve the learning of the model. | Open |
3 | S7_File3 | Improving the Model (Image Augmentation, Batch Size(Sweet Spot), Regularization) | Improve the model learning by:<ol><li>Adding image augmentation</li><li>Reducing batch size</li><li>Adding regularization at correct position with reduced batch size</li><ol> | <ul><li>Best Train Accuracy - 98.55%</li><li> Best Test Accuracy - 99.42%</li><li>Total Parameters - 8000</li></ul> | <ol><li>Adding image augmentation of scaling, translation and rotation increased the difficulty of model’s training so we see an improvement in the test accuracy</li><li>Reducing batch size from 512 to 128 improved the generalization capability of the model on the test dataset and brought the test accuracy in the 99.4% threshold. 128 batch size is the sweet spot for this model, below which the test accuracy degrades. This is due to the existence to “noise” in small batch size training. Because neural network systems are extremely prone to overfitting, upon seeing many small batch size, each batch being a “noisy” representation of the entire dataset, will cause a sort of “tug-and-pull” dynamic. This “tug-and-pull” dynamic prevents the neural network from overfitting on the training set and hence performing badly on the test set.</li><li>Adding STEP LR at correct position of 8 epochs instead of 4. This helped in reducing the epochs for achieving 99.4% test accuracy consistently.</li><ol>With the above experimentation, I was able to achieve 99.4% test accuracy consistently. | Open |
Final Model Architecture
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 8, 26, 26] 72
ReLU-2 [-1, 8, 26, 26] 0
BatchNorm2d-3 [-1, 8, 26, 26] 16
Dropout-4 [-1, 8, 26, 26] 0
Conv2d-5 [-1, 14, 24, 24] 1,008
ReLU-6 [-1, 14, 24, 24] 0
BatchNorm2d-7 [-1, 14, 24, 24] 28
Dropout-8 [-1, 14, 24, 24] 0
Conv2d-9 [-1, 10, 24, 24] 140
MaxPool2d-10 [-1, 10, 12, 12] 0
Conv2d-11 [-1, 14, 10, 10] 1,260
ReLU-12 [-1, 14, 10, 10] 0
BatchNorm2d-13 [-1, 14, 10, 10] 28
Dropout-14 [-1, 14, 10, 10] 0
Conv2d-15 [-1, 16, 8, 8] 2,016
ReLU-16 [-1, 16, 8, 8] 0
BatchNorm2d-17 [-1, 16, 8, 8] 32
Dropout-18 [-1, 16, 8, 8] 0
Conv2d-19 [-1, 20, 6, 6] 2,880
ReLU-20 [-1, 20, 6, 6] 0
BatchNorm2d-21 [-1, 20, 6, 6] 40
Dropout-22 [-1, 20, 6, 6] 0
AvgPool2d-23 [-1, 20, 1, 1] 0
Conv2d-24 [-1, 16, 1, 1] 320
Conv2d-25 [-1, 10, 1, 1] 160
================================================================
Total params: 8,000
Trainable params: 8,000
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.56
Params size (MB): 0.03
Estimated Total Size (MB): 0.60
----------------------------------------------------------------
Block | Layer | Input Size | Output Size | Receptive Field |
---|---|---|---|---|
Input Block | Conv2D(3x3) | 28x28x1 | 26x26x8 | 3x3 |
Convolution Block 1 | Conv2D(3x3) | 26x26x8 | 24x24x14 | 5x5 |
Transition Block 1 | Conv2D(1x1) | 24x24x14 | 24x24x10 | 5x5 |
Convolution Block 1 | Max Pool(2x2) | 24x24x10 | 12x12x10 | 6x6 |
Convolution Block 1 | Conv2D(3x3) | 12x12x10 | 10x10x14 | 10x10 |
Convolution Block 1 | Conv2D(3x3) | 10x10x14 | 8x8x16 | 14x14 |
Convolution Block 1 | Conv2D(3x3) | 8x8x16 | 6x6x20 | 18x18 |
Output Block | GAP | 6x6x20 | 1x1x20 | 28x28 |
Output Block | FC | 1x1x20 | 1x1x16 | 28x28 |
Output Block | FC | 1x1x16 | 1x1x10 | 28x28 |
Final Results:
- Best Train Accuracy - 98.55%
- Best Test Accuracy - 99.42%
- Total Parameters - 8000
Trials Details
S7_File1: Basic Skeleton Creation
Targets:
Create a basic skeleton with less than 8K parameters which is able to reach 99% in less than 15 epochs. The basic skeleton was created based on the expand and squeeze architecture.
Results:
- Best Train Accuracy - 99.59%
- Best Test Accuracy - 99.27%
- Total Parameters - 8000
Analysis:
Good starting model but high overfitting
Train/Test Logs
Epoch 1
Train: Loss=0.0996 Batch_id=117 Accuracy=85.17: 100%|██████████| 118/118 [00:21<00:00, 5.46it/s]
Test set: Average loss: 0.1631, Accuracy: 9509/10000 (95.09%)
Epoch 2
Train: Loss=0.0364 Batch_id=117 Accuracy=98.00: 100%|██████████| 118/118 [00:21<00:00, 5.46it/s]
Test set: Average loss: 0.0541, Accuracy: 9833/10000 (98.33%)
Epoch 3
Train: Loss=0.1695 Batch_id=117 Accuracy=98.56: 100%|██████████| 118/118 [00:21<00:00, 5.38it/s]
Test set: Average loss: 0.0441, Accuracy: 9854/10000 (98.54%)
Epoch 4
Train: Loss=0.0391 Batch_id=117 Accuracy=98.78: 100%|██████████| 118/118 [00:21<00:00, 5.48it/s]
Test set: Average loss: 0.0378, Accuracy: 9893/10000 (98.93%)
Epoch 5
Train: Loss=0.0258 Batch_id=117 Accuracy=99.02: 100%|██████████| 118/118 [00:22<00:00, 5.31it/s]
Test set: Average loss: 0.0381, Accuracy: 9889/10000 (98.89%)
Epoch 6
Train: Loss=0.0136 Batch_id=117 Accuracy=99.01: 100%|██████████| 118/118 [00:21<00:00, 5.60it/s]
Test set: Average loss: 0.0357, Accuracy: 9879/10000 (98.79%)
Epoch 7
Train: Loss=0.0276 Batch_id=117 Accuracy=99.21: 100%|██████████| 118/118 [00:20<00:00, 5.68it/s]
Test set: Average loss: 0.0299, Accuracy: 9908/10000 (99.08%)
Epoch 8
Train: Loss=0.0548 Batch_id=117 Accuracy=99.24: 100%|██████████| 118/118 [00:20<00:00, 5.70it/s]
Test set: Average loss: 0.0287, Accuracy: 9908/10000 (99.08%)
Epoch 9
Train: Loss=0.0077 Batch_id=117 Accuracy=99.28: 100%|██████████| 118/118 [00:20<00:00, 5.70it/s]
Test set: Average loss: 0.0318, Accuracy: 9897/10000 (98.97%)
Epoch 10
Train: Loss=0.0552 Batch_id=117 Accuracy=99.36: 100%|██████████| 118/118 [00:21<00:00, 5.49it/s]
Test set: Average loss: 0.0251, Accuracy: 9916/10000 (99.16%)
Epoch 11
Train: Loss=0.0032 Batch_id=117 Accuracy=99.43: 100%|██████████| 118/118 [00:22<00:00, 5.24it/s]
Test set: Average loss: 0.0248, Accuracy: 9920/10000 (99.20%)
Epoch 12
Train: Loss=0.0199 Batch_id=117 Accuracy=99.53: 100%|██████████| 118/118 [00:21<00:00, 5.47it/s]
Test set: Average loss: 0.0271, Accuracy: 9926/10000 (99.26%)
Epoch 13
Train: Loss=0.0038 Batch_id=117 Accuracy=99.55: 100%|██████████| 118/118 [00:21<00:00, 5.45it/s]
Test set: Average loss: 0.0308, Accuracy: 9912/10000 (99.12%)
Epoch 14
Train: Loss=0.0014 Batch_id=117 Accuracy=99.59: 100%|██████████| 118/118 [00:21<00:00, 5.48it/s]
Test set: Average loss: 0.0259, Accuracy: 9917/10000 (99.17%)
Epoch 15
Train: Loss=0.0230 Batch_id=117 Accuracy=99.58: 100%|██████████| 118/118 [00:21<00:00, 5.39it/s]
Test set: Average loss: 0.0247, Accuracy: 9927/10000 (99.27%)
Train/Test Visualization
S7_File2: Improving the Basic Model (Reducing Overfitting)
Targets:
Improve the basic model by reducing overfitting. Added dropout of 0.05 to reduce overfitting. With the basic model, I was able to achieve 99.4% accuracy when trained for around 40 epochs. This means that the model has the capacity to reach 99.4%. So, after adding dropout, I added step LR starting at 0.1 and reducing by 0.1 at every 4 epochs. These 2 numbers were found after experimenting.
Results:
- Best Train Accuracy - 98.54%
- Best Test Accuracy - 99.38%
- Total Parameters - 8000
Analysis:
Overfitting reduced and was consistently able to maintain 99.3% test accuracy. Increasing the learning rate to 0.1 helped to reach higher accuracy sooner and gradually decreasing the learning rate by 0.1 helped in achieving stable results. Giving more training sample can improve the learning of the model.
Train/Test Logs
Adjusting learning rate of group 0 to 1.0000e-01.
Epoch 1
Train: Loss=0.1322 Batch_id=468 Accuracy=89.18: 100%|██████████| 469/469 [00:32<00:00, 14.39it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0502, Accuracy: 9838/10000 (98.38%)
Epoch 2
Train: Loss=0.1159 Batch_id=468 Accuracy=96.72: 100%|██████████| 469/469 [00:31<00:00, 14.85it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0327, Accuracy: 9890/10000 (98.90%)
Epoch 3
Train: Loss=0.0532 Batch_id=468 Accuracy=97.26: 100%|██████████| 469/469 [00:33<00:00, 14.17it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0286, Accuracy: 9917/10000 (99.17%)
Epoch 4
Train: Loss=0.0535 Batch_id=468 Accuracy=97.46: 100%|██████████| 469/469 [00:33<00:00, 13.80it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0318, Accuracy: 9899/10000 (98.99%)
Epoch 5
Train: Loss=0.1066 Batch_id=468 Accuracy=98.14: 100%|██████████| 469/469 [00:32<00:00, 14.22it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0202, Accuracy: 9933/10000 (99.33%)
Epoch 6
Train: Loss=0.0813 Batch_id=468 Accuracy=98.30: 100%|██████████| 469/469 [00:32<00:00, 14.54it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0199, Accuracy: 9935/10000 (99.35%)
Epoch 7
Train: Loss=0.0108 Batch_id=468 Accuracy=98.33: 100%|██████████| 469/469 [00:32<00:00, 14.34it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0200, Accuracy: 9934/10000 (99.34%)
Epoch 8
Train: Loss=0.0686 Batch_id=468 Accuracy=98.32: 100%|██████████| 469/469 [00:34<00:00, 13.68it/s]Adjusting learning rate of group 0 to 1.0000e-03.
Test set: Average loss: 0.0189, Accuracy: 9937/10000 (99.37%)
Epoch 9
Train: Loss=0.0230 Batch_id=468 Accuracy=98.50: 100%|██████████| 469/469 [00:32<00:00, 14.25it/s]Adjusting learning rate of group 0 to 1.0000e-03.
Test set: Average loss: 0.0193, Accuracy: 9936/10000 (99.36%)
Epoch 10
Train: Loss=0.0431 Batch_id=468 Accuracy=98.40: 100%|██████████| 469/469 [00:31<00:00, 14.76it/s]Adjusting learning rate of group 0 to 1.0000e-03.
Test set: Average loss: 0.0197, Accuracy: 9932/10000 (99.32%)
Epoch 11
Train: Loss=0.0999 Batch_id=468 Accuracy=98.38: 100%|██████████| 469/469 [00:31<00:00, 14.85it/s]Adjusting learning rate of group 0 to 1.0000e-03.
Test set: Average loss: 0.0190, Accuracy: 9936/10000 (99.36%)
Epoch 12
Train: Loss=0.1079 Batch_id=468 Accuracy=98.53: 100%|██████████| 469/469 [00:33<00:00, 14.07it/s]Adjusting learning rate of group 0 to 1.0000e-04.
Test set: Average loss: 0.0188, Accuracy: 9938/10000 (99.38%)
Epoch 13
Train: Loss=0.0809 Batch_id=468 Accuracy=98.54: 100%|██████████| 469/469 [00:32<00:00, 14.48it/s]Adjusting learning rate of group 0 to 1.0000e-04.
Test set: Average loss: 0.0196, Accuracy: 9935/10000 (99.35%)
Epoch 14
Train: Loss=0.0260 Batch_id=468 Accuracy=98.53: 100%|██████████| 469/469 [00:31<00:00, 14.93it/s]Adjusting learning rate of group 0 to 1.0000e-04.
Test set: Average loss: 0.0189, Accuracy: 9937/10000 (99.37%)
Epoch 15
Train: Loss=0.0853 Batch_id=468 Accuracy=98.45: 100%|██████████| 469/469 [00:31<00:00, 15.00it/s]Adjusting learning rate of group 0 to 1.0000e-04.
Test set: Average loss: 0.0198, Accuracy: 9933/10000 (99.33%)
S7_File3: Improving the Model (Image Augmentation, Batch Size(Sweet Spot), Regularization)
Targets:
Improve the model learning by:
i) Adding image augmentation
ii) Reducing batch size
iii) Adding regularization at correct position with reduced batch size
Results:
- Best Train Accuracy - 98.55%
- Best Test Accuracy - 99.42%
- Total Parameters - 8000
Analysis:
i) Adding image augmentation of scaling, translation and rotation increased the difficulty of model’s training so we see an improvement in the test accuracy
ii) Reducing batch size from 512 to 128 improved the generalization capability of the model on the test dataset and brought the test accuracy in the 99.4% threshold. 128 batch size is the sweet spot for this model, below which the test accuracy degrades. This is due to the existence to “noise” in small batch size training. Because neural network systems are extremely prone to overfitting, upon seeing many small batch size, each batch being a “noisy” representation of the entire dataset, will cause a sort of “tug-and-pull” dynamic. This “tug-and-pull” dynamic prevents the neural network from overfitting on the training set and hence performing badly on the test set.
iii) Adding STEP LR at correct position of 8 epochs instead of 4. This helped in reducing the epochs for achieving 99.4% test accuracy consistently.
With the above experimentation, I was able to achieve 99.4% test accuracy consistently.
Train/Test Logs
Adjusting learning rate of group 0 to 1.0000e-01.
Epoch 1
Train: Loss=0.0786 Batch_id=468 Accuracy=87.56: 100%|██████████| 469/469 [00:33<00:00, 14.01it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0528, Accuracy: 9831/10000 (98.31%)
Epoch 2
Train: Loss=0.1295 Batch_id=468 Accuracy=96.28: 100%|██████████| 469/469 [00:25<00:00, 18.15it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0507, Accuracy: 9841/10000 (98.41%)
Epoch 3
Train: Loss=0.1093 Batch_id=468 Accuracy=97.08: 100%|██████████| 469/469 [00:26<00:00, 18.00it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0531, Accuracy: 9841/10000 (98.41%)
Epoch 4
Train: Loss=0.1115 Batch_id=468 Accuracy=97.30: 100%|██████████| 469/469 [00:27<00:00, 17.20it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0295, Accuracy: 9905/10000 (99.05%)
Epoch 5
Train: Loss=0.0523 Batch_id=468 Accuracy=97.48: 100%|██████████| 469/469 [00:26<00:00, 17.98it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0334, Accuracy: 9895/10000 (98.95%)
Epoch 6
Train: Loss=0.0316 Batch_id=468 Accuracy=97.72: 100%|██████████| 469/469 [00:25<00:00, 18.20it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0367, Accuracy: 9884/10000 (98.84%)
Epoch 7
Train: Loss=0.0307 Batch_id=468 Accuracy=97.87: 100%|██████████| 469/469 [00:26<00:00, 17.84it/s]Adjusting learning rate of group 0 to 1.0000e-01.
Test set: Average loss: 0.0242, Accuracy: 9919/10000 (99.19%)
Epoch 8
Train: Loss=0.0396 Batch_id=468 Accuracy=97.92: 100%|██████████| 469/469 [00:25<00:00, 18.06it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0372, Accuracy: 9878/10000 (98.78%)
Epoch 9
Train: Loss=0.0392 Batch_id=468 Accuracy=98.36: 100%|██████████| 469/469 [00:26<00:00, 17.79it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0190, Accuracy: 9936/10000 (99.36%)
Epoch 10
Train: Loss=0.0800 Batch_id=468 Accuracy=98.34: 100%|██████████| 469/469 [00:26<00:00, 17.44it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0186, Accuracy: 9942/10000 (99.42%)
Epoch 11
Train: Loss=0.0219 Batch_id=468 Accuracy=98.45: 100%|██████████| 469/469 [00:26<00:00, 17.87it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0189, Accuracy: 9942/10000 (99.42%)
Epoch 12
Train: Loss=0.0387 Batch_id=468 Accuracy=98.50: 100%|██████████| 469/469 [00:25<00:00, 18.15it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0182, Accuracy: 9940/10000 (99.40%)
Epoch 13
Train: Loss=0.0534 Batch_id=468 Accuracy=98.46: 100%|██████████| 469/469 [00:26<00:00, 18.03it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0186, Accuracy: 9940/10000 (99.40%)
Epoch 14
Train: Loss=0.0075 Batch_id=468 Accuracy=98.55: 100%|██████████| 469/469 [00:26<00:00, 17.68it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0184, Accuracy: 9941/10000 (99.41%)
Epoch 15
Train: Loss=0.0594 Batch_id=468 Accuracy=98.48: 100%|██████████| 469/469 [00:26<00:00, 17.49it/s]Adjusting learning rate of group 0 to 1.0000e-02.
Test set: Average loss: 0.0183, Accuracy: 9942/10000 (99.42%)
Train/Test Visualization