Label Smoothing
- In classfication problems we usually have labels like
0
and 1
, so the model job is to return them accuaratly, even 0.999 is not good enough where the label is 1!. this cuase the model to update the weights in order to get closer and closer to the right and unique answear, what will lead to overfitting.
- The solution to this problem is to smooth the labels, by replacing 1 with a bit smaller number, and 0 with a bit bigger number. this will encourage the model to be less confident, which will help to better generalization.
- Label Smoothing can be expressed mathematically:
0
: \(\frac{\epsilon}{N}\) where N is number of classes we have and epsilon represent a parameter usually 0.1(it’s like saying we’re 10% less confident about the label)
1
: \(1-\epsilon + \frac{\epsilon}{N}\)
- In our Imagenette example where we have 10 classes, the targets become something like (here for a target that corresponds to the index 3):
[0.01, 0.01, 0.01, 0.91, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
- To use this in practice, we just have to change the loss function in our call to
Learner
:
```python model = xresnet50(n_out=dls.c) learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(), metrics=accuracy) learn.fit_one_cycle(5, 3e-3)