Chain Rule (Probability)
Give any of these word sequences, what is the probability of the next word?
 Premature optimization is the root of all ____ Donald Knuth
 A house divided against itself ____ ____ Abraham Lincoln
 The quick brown fox jumped over the ____ ____ _____ Wm. Shakespeare
 A friend to all is ____ ____ ____ ____ Aristotle
If you were able to complete these word sequences, it was likely from prior knowledge and exposure to the complete sequence.
Not all word sequences are this obvious. But for any given word sequence, it should be possible to compute the probability of the next word.
NGrams
Word sequences are given a formal name:
Unigram 
A sequence of one word WebSphere, Mobile, Coffee 
Bigram 
A sequence of two words: cannot stand, Lotus Notes 
Trigram 
A sequence of three words: Lazy yellow dog, friend to none, Rational Software Architect 
4Gram 
A sequence of four words: Play it again Sam 
5Gram 
A sequence of five words 
6Gram 
A sequence of six words (etc) 
What is the probability that "Sam" will occur after the trigram "Play it again"? The word sequence might well be "Play it again Sally", "Play it again Louise" or "Play it again and again", and so on. If we want to compute the probability of "Sam" occurring next, how do we do this?
The chain rule of probability:
P(W) = P(w4  w1, w2, w3)
This can be stated:
P(W) 
"A sequence of words" 
= 

P(w4  w1, w2, w3) 
"The conditional probability of word w4 given the sequence w1,w2,w3." 
So if we plug the values for "Play it again Sam" into this formula, we get
P(Sam  Play, it, again )
So given the word sequence { Play, it, again }, what is the probability of "Sam" being the fourth word in this sequence?
We can answer a question with a question.
 What is the probability that "it" will follow "play"?
 What is the probability that "again" will follow "play it"?
 What is the probability that "Sam" will follow "play it again"?
The probability of
P(A, B, C, D)
is
P(A) * P(B  A) * P(C  A, B) * P(D  A, B, C)
or with values in place:
P(Play, it, again, Sam)
is
P(Play) * P(it  Play) * P(again  Play, it ) * P(Sam  Play, it, again )
This will give us the joint probability of the entire sequence.
Chain Rule (general form)
This should be easy for Java/C programmers to understand.
The "product" designation can be taken as a "for loop"
String[] words = { "play", "it", "again", "Sam" };
int total = 0;
for (int i = 0; i < words.length(); i++) {
total += // probability of words[i] given
// the joint probability of words[0] through words[i  1]
}
The "product" designation simply acts as a mathematical shorthand describing a "for loop" iteration over all i.
[Step 1] So for each word at i, [2] calculate the probability of that word, [3] conditional on the probability of the sequence of words, [4] right up to that point
.
[Step 5] The joint probability of a sequence of words,
[2] is the probability over all i,
[3]of the probability of each word,
[4]times the prefix up until that point.
Markov Simplifying Assumption
The probability of each word, times the sequence up until that point, where the sequence starts at k, and k is assumed to be > 0.
Tags:
text
markov
probability
trigram
model
semantic
rule
assumption
web
unstructured
chain
language
bigram
ngram
markov_model
simplifying
watson