Huber loss, also called smooth L1 loss, aims to balance the strengths of both MAE and MSE. It incorporates an adjustable hyperparameter, δ, that acts as a transition point: for loss values below or equal to δ, Huber loss is quadratic (such as MSE); for loss values greater than δ, Huber loss is linear (such as MAE).
Huber loss thus offers a fully differentiable function with MAE’s robustness to outliers and MSE’s ease of optimization through gradient descent. The transition from quadratic to linear behavior at δ also results in an optimization less prone to problems such as vanishing or exploding gradients when compared to MSE loss.
These benefits are tempered by the need to carefully define δ, adding complexity to model development. Huber loss is most appropriate when neither MSE nor MAE can yield satisfactory results, such as when a model should be robust to outliers but still harshly penalize extreme values that are beyond some specific threshold.