Blogs - Tags - bigram urn:lsid:ibm.com:blogs:entries03012015-05-15T16:43:55-04:00IBM Connections - Blogsurn:lsid:ibm.com:blogs:entry-f0db87a8-08fe-4032-ae74-6e7286be466bThe Chain Rule of ProbabilityCraigTrim110000G799activefalseEntradas de ComentáriosCurtirtrue2012-11-04T01:24:14-05:002012-11-04T16:57:45-05:00
<h2>Chain Rule (Probability)</h2>
<p>Give any of these word sequences, what is the probability of the next word?</p>
<ol>
<li>Premature optimization is the root of all ____ -Donald Knuth</li>
<li>A house divided against itself ____ ____ -Abraham Lincoln</li>
<li>The quick brown fox jumped over the ____ ____ _____ -Wm. Shakespeare</li>
<li>A friend to all is ____ ____ ____ ____ -Aristotle</li>
</ol>
<p>If you were able to complete these word sequences, it was likely from prior knowledge and exposure to the complete sequence.</p>
<p>Not all word sequences are this obvious. But for any given word sequence, it should be possible to compute the probability of the next word.</p>
<h3>N-Grams</h3>
<p>Word sequences are given a formal name:</p>
<table border="1" width="100%">
<tbody><tr>
<td width="30%">Unigram</td>
<td width="70%">A sequence of one word<br /><i>WebSphere, Mobile, Coffee</i></td>
</tr>
<tr>
<td width="30%">Bigram</td>
<td width="70%">A sequence of two words:<br /><i>cannot stand, Lotus Notes</i></td>
</tr>
<tr>
<td width="30%">Trigram</td>
<td width="70%">A sequence of three words:<br /><i>Lazy yellow dog, friend to none, Rational Software Architect</i></td>
</tr>
<tr>
<td width="30%">4-Gram</td>
<td width="70%">A sequence of four words:<br /><i>Play it again Sam</i></td>
</tr>
<tr>
<td width="30%">5-Gram</td>
<td width="70%">A sequence of five words</td>
</tr>
<tr>
<td width="30%">6-Gram</td>
<td width="70%">A sequence of six words (etc)</td>
</tr>
</tbody></table>
<p>What is the probability that "Sam" will occur after the trigram "Play it again"? The word sequence might well be "Play it again Sally", "Play it again Louise" or "Play it again and again", and so on. If we want to compute the probability of "Sam" occurring next, how do we do this?</p>
<p>The chain rule of probability:</p>
<blockquote>P(W) = P(w4 | w1, w2, w3)</blockquote>
<p>This can be stated:</p>
<table _moz_resizing="true" border="1" width="70%">
<tbody><tr>
<td width="50%">P(W)</td>
<td width="50%">"A sequence of words"</td>
</tr>
<tr>
<td width="50%">=</td>
<td width="50%">
</td></tr>
<tr>
<td width="50%">P(w4 | w1, w2, w3)</td>
<td width="50%">"The conditional probability of word w4 given the sequence w1,w2,w3."</td>
</tr>
</tbody></table>
<p>So if we plug the values for "Play it again Sam" into this formula, we get</p>
<blockquote>P(Sam | Play, it, again )</blockquote>
<p>So given the word sequence { Play, it, again }, what is the probability of "Sam" being the fourth word in this sequence?</p>
<p>We can answer a question with a question.</p>
<a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-03.jpg" target="_blank"><img alt="image" src="https://dw1.s81c.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-03.jpg" style=" display:block; margin: 0 auto;text-align: center; position:relative;" /></a>
<ol>
<li>What is the probability that "it" will follow "play"?</li>
<li>What is the probability that "again" will follow "play it"?</li>
<li>What is the probability that "Sam" will follow "play it again"?</li>
</ol>
<p>The probability of</p>
<blockquote>P(A, B, C, D)</blockquote>
<p>is</p>
<blockquote>P(A) * P(B | A) * P(C | A, B) * P(D | A, B, C)</blockquote>
<p>or with values in place:</p>
<blockquote>P(Play, it, again, Sam)</blockquote>
<p>is</p>
<blockquote>P(Play) * P(it | Play) * P(again | Play, it ) * P(Sam | Play, it, again )</blockquote>
<p>This will give us the joint probability of the entire sequence.</p>
<br />
<h3>Chain Rule (general form)</h3>
<a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-03.jpg" target="_blank"><img alt="image" src="https://dw1.s81c.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-01.jpg" style=" display:block; margin: 0 auto;text-align: center; position:relative;" /></a>
<p>This should be easy for Java/C programmers to understand.</p>
<p>The "product" designation can be taken as a "for loop"</p>
<code>
String[] words = { "play", "it", "again", "Sam" };<br />
int total = 0;<br />
for (int i = 0; i < words.length(); i++) {<br />
total += // probability of words[i] given<br />
// the joint probability of words[0] through words[i - 1]<br />
}
</code>
<p>The "product" designation simply acts as a mathematical shorthand describing a "for loop" iteration over all i.</p>
<p><span id="pos">[Step 1]</span> So for each word at i, <span id="pos">[2]</span> calculate the probability of that word, <span id="pos">[3]</span> conditional on the probability of the sequence of words, <span id="pos">[4]</span> right up to that point</p>. <span id="pos">[Step 5]</span> The joint probability of a sequence of words, <span id="pos">[2]</span> is the probability over all i, <span id="pos">[3]</span>of the probability of each word, <span id="pos">[4]</span>times the prefix up until that point.</p>
<h3>Markov Simplifying Assumption</h3>
<a href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-03.jpg" target="_blank"><img alt="image" src="https://dw1.s81c.com/developerworks/mydeveloperworks/blogs/nlp/resource/0180/chain-rule-02.jpg" style=" display:block; margin: 0 auto;text-align: center; position:relative;" /></a>
<p>The probability of each word, times the sequence up until that point, where the sequence starts at k, and k is assumed to be > 0.</p>
Chain Rule (Probability) Give any of these word sequences, what is the probability of the next word? Premature optimization is the root of all ____ -Donald Knuth A house divided against itself ____ ____ -Abraham Lincoln The quick brown fox jumped over the...3111869