Background of Naive Bayes

You can apply the Naive Bayes classification algorithm fast and easy because it is not as complex as most algorithms.

Note: This feature is available starting from Db2® version 11.5.4.

Although its classification accuracy is not as precise as the one of more complex algorithms, you might get similar results in a fraction of the computation time. To calculate the conditional class probability of the attribute values, the Naive Bayes classifier algorithm uses the Bayes' theorem. This kind of probability is called the posterior class probability.

The Naive Bayes classification algorithm looks as follows:

The numerator is referred to as the Bayes numerator, and the denominator is referred to as the Bayes denominator. The calculation is based on the prior class probabilities. These probabilities can be directly estimated from the data, based on the joint inverse conditional attribute values probability of the class:

You can efficiently calculate the joint inverse conditional attribute as the product of the per-attribute conditional attribute value probabilities. You can estimate these probabilities directly from the data and on the assumption that the attributes of the class are conditionally independent. In practice, this assumption might be untrue, which is the reason why the algorithm is called naive. While violations of this assumption result in incorrect probability calculations, these violations might not affect the accuracy of predictions. The predictions might be correct, even if incorrect probabilities are used for the calculation. The Naive Bayes classifier was observed to predict well in several domains where the assumption about independence was not met.

As a result, the posterior probability of this class is also calculated as 0, if the estimated probability of one attribute value within a class is 0. If this condition is true for all classes, no prediction is possible. To prevent this problem, you can replace zero probabilities with sufficient small positive numbers. If there is no instance with the value of attribute equal in class , the probability can be set to half of the probability that is estimated if one of such an instance exists. A more refined approach to avoid the problem is to use a modified probability estimation technique. This technique is known as the -estimation. For this estimation, the per-class attribute value frequencies are augmented by the a priori assumed frequencies for a few hypothetical instances. Per-class attribute value frequencies are also observed for training instances. Without specific knowledge about the domain, these prior frequencies are assumed to be equal for all attribute values within particular classes. Additionally, it is also assumed that the number of included hypothetical instances is equal to the number of possible attribute values.