For this dataset, the entropy is 0.94. This can be calculated by finding the proportion of days where “Play Tennis” is “Yes”, which is 9/14, and the proportion of days where “Play Tennis” is “No”, which is 5/14. Then, these values can be plugged into the entropy formula above.
Entropy (Tennis) = -(9/14) log2(9/14) – (5/14) log2 (5/14) = 0.94
We can then compute the information gain for each of the attributes individually. For example, the information gain for the attribute, “Humidity” would be the following:
Gain (Tennis, Humidity) = (0.94)-(7/14)*(0.985) – (7/14)*(0.592) = 0.151
As a recap,
- 7/14 represents the proportion of values where humidity equals “high” to the total number of humidity values. In this case, the number of values where humidity equals “high” is the same as the number of values where humidity equals “normal”.
- 0.985 is the entropy when Humidity = “high”
- 0.59 is the entropy when Humidity = “normal”
Then, repeat the calculation for information gain for each attribute in the table above, and select the attribute with the highest information gain to be the first split point in the decision tree. In this case, outlook produces the highest information gain. From there, the process is repeated for each subtree.