Learning rate is important because it guides AI models in learning effectively from its training data.
A low learning rate doesn’t let the model “learn” enough at each step. The model updates its parameters too slowly and take too long to reach convergence. But that doesn’t mean that a high learning rate is the answer.
With a high learning rate, the algorithm can fall victim to overshooting: where it goes too far in correcting its mistakes. In this case, the algorithm needs a smaller learning rate, but not too small that learning is inefficient.
As an example, imagine an alien who has come to learn about life on Earth. The alien sees cats, dogs, horses, pigs and cows and concludes that all animals have four legs. Then, the alien sees a chicken. Is this creature also an animal? Depending on the alien’s learning rate, they will reach one of three conclusions:
At an optimal learning rate, the alien will conclude that chickens are also animals. And if that is the case, this must mean that leg quantity is not a key determinant of whether something is an animal or not.
At a high learning rate, the alien will overcorrect. Now, it will conclude that because the chicken is an animal, and because the chicken has two legs, that all animals must have two legs. A high learning rate means that the model learns “too much” at once.
Different learning rates result in different learning outcomes. The best learning rate is one that allows the algorithm to adjust the model’s parameters in a timely manner without overshooting the point of convergence.