Blogs - Tags - watsonstudio Blogsurn:lsid:ibm.com:blogs:entries03042017-11-01T16:01:43-04:00IBM Connections - Blogsurn:lsid:ibm.com:blogs:entry-b9470a72-8046-4f13-acbe-7696b767dad3Using Quantum Operations to Achieve Computing ObjectivesJohn M. Boyer060000VMNYactivefalseJohn M. Boyer060000VMNYactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2018-12-01T20:12:45-05:002018-12-01T20:20:31-05:00<h2 dir="ltr">Introduction</h2>
<p dir="ltr">The purpose of this notebook article is to demonstrate using quantum operations to achieve computing objectives. The term <em>quantum operation</em> refers to a higher level method that may be implemented with one or many low-level quantum gates. More generally, the goal is to demonstrate the notion that quantum algorithms use quantum operations to implement constraints that coerce qubits from representing any possible outcomes to representing the outcomes that satisfy the constraints.</p>
<p dir="ltr">Using John Preskill's terminology, we now have "noisy intermediate scale quantum" (NISQ) computers that can obtain a desired outcome with high probablility, i.e. where the desired outcome rises well above the noise that can occur within the current early-stage quantum computing devices. In this notebook article, we will implement a classical computing algorithm in order to see how differently it is done in quantum computing, and we will see that the desired outcome occurs by far the most frequently. However, note that the emphasis is on understanding how the quantum operations achieve an easily understood result, and so there is no quantum computing speedup in this case.</p>
<p dir="ltr">In this noteboook article, we will create a <strong>quantum circuit</strong> that uses quantum operations to perform <strong>addition</strong> of two single bit numbers. This problem reduces to developing quantum operation sequences that perform a classical 'XOR' operation to calculate the least significant bit and a classical 'AND' operation to calculate the most significant bit of the answer. This can be seen in the two columns of the expected answers below:</p>
<pre dir="ltr">
0+0=00
0+1=01
1+0=01
1+1=10</pre>
<h2 dir="ltr">Qiskit Installation and Import</h2>
<p dir="ltr">The Quantum Information Science development kit, or Qiskit, is a library and framework for either connecting to and running quantum computing programs on a real IBM Q quantum computer or simulating them on the user's classical computing environment. The first cell below contains code to run once to get Qiskit installed. The second cell below should be run any time the notebook starts to import the parts of Qiskit relevant to this notebook's operations.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[1] !pip install --user qiskit
[2] from qiskit import QuantumCircuit, ClassicalRegister, QuantumRegister
from qiskit import execute, IBMQ, Aer, QISKitError
from qiskit.backends.ibmq import least_busy</span></pre>
<h2 dir="ltr">The Memory Model for the Quantum Circuit</h2>
<p dir="ltr">This notebook uses qubits <em>q<sub>0</sub></em> and <em>q<sub>1</sub></em> for the inputs. Qubit <em>q<sub>2</sub></em> will be used for the least significant bit of the answer, and qubit <em>q<sub>3</sub></em> will be for the most significant bit.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[3] num_qubits = 4
q = QuantumRegister(num_qubits)
c = ClassicalRegister(num_qubits)</span></pre>
<h2 dir="ltr">Quantum Circuit Initialization and Input</h2>
<p dir="ltr">The following cell creates the quantum circuit with the quantum and classical registers. Then, it assigns the input to qubits <em>q<sub>0</sub></em> and <em>q<sub>1</sub></em>.</p>
<p dir="ltr">The ground state |0> is the default, so an X gate is used on qubits that must start in the excited state |1>. The X gate performs a <em>pi</em> radian rotation about the X-axis, which rotates |0> (a.k.a. |+z>) through the Y-axis to |1> (a.k.a. |-z>). The X gate is sometimes called a NOT gate, but note that it performs a <em>pi</em> radian rotation that happens to perform a classical NOT, or bit flip, only when the qubit is in |0> or |1> state. <strong>To change the input</strong>, comment out the X gate operation on any qubits that should be |0> and ensure the X gate is not commented on any qubits that should be initialized to |1>.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[4] qc = QuantumCircuit(q, c)
qc.x(q[0])
qc.x(q[1])</span></pre>
<h2 dir="ltr">Performing 'XOR' with Quantum Operations</h2>
<p dir="ltr">An 'XOR' can be performed with two quantum operations. The inputs of the 'XOR' come from qubits <em>q<sub>0</sub></em> and <em>q<sub>1</sub></em>, and the output of the 'XOR' will go to qubit <em>q<sub>2</sub></em>. The output qubit, <em>q<sub>2</sub></em>, starts in the ground state, |0>.</p>
<p dir="ltr">We first apply a controlled-not operation with <em>q<sub>2</sub></em> as the target of the control and with <em>q<sub>0</sub></em> as the source. The controlled-not is also called CNOT, or CX. This operation negates the target if the source is excited (|1>). By itself, this operation changes <em>q<sub>2</sub></em> from |0> to |1> if <em>q<sub>0</sub></em> is |1>, and it leaves <em>q<sub>2</sub></em> unchanged if <em>q<sub>0</sub></em> is |0>.</p>
<p dir="ltr">Next, we apply a CNOT with qubit <em>q<sub>2</sub></em> as the target and with <em>q<sub>1</sub></em> as the source. If <em>q<sub>1</sub></em> is |0>, then <em>q<sub>2</sub></em> is unchanged from the effect of the CNOT with <em>q<sub>0</sub></em>. Therefore, we have:</p>
<p dir="ltr"><em>q<sub>0</sub></em>=|0> <em>q<sub>1</sub></em>=|0> results in <em>q<sub>2</sub></em>=|0></p>
<p dir="ltr"><em>q<sub>0</sub></em>=|1> <em>q<sub>1</sub></em>=|0> results in <em>q<sub>2</sub></em>=|1></p>
<p dir="ltr">However, if <em>q<sub>1</sub></em> is |1>, then <em>q<sub>2</sub></em> is inverted relative to the effect of the CNOT with <em>q<sub>0</sub></em>. Therefore, we have:</p>
<p dir="ltr"><em>q<sub>0</sub></em>=|0> <em>q<sub>1</sub></em>=|1> results in <em>q<sub>2</sub></em>=|1></p>
<p dir="ltr"><em>q<sub>0</sub></em>=|1> <em>q<sub>1</sub></em>=|1> results in <em>q<sub>2</sub></em>=|0></p>
<p dir="ltr">This concludes the method for performing 'XOR' with quantum operation, which calculates the least significant bit of the single bit addition result.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[5] qc.cx(q[0], q[2])
qc.cx(q[1], q[2])</span></pre>
<h2 dir="ltr">Performing 'AND' with Quantum Operations</h2>
<p dir="ltr">An 'AND' can be performed with three quantum operations. The inputs of the 'AND' come from qubits <em>q<sub>0</sub></em> and <em>q<sub>1</sub></em>, and the output of the 'AND' will go to qubit <em>q<sub>3</sub></em>. The output qubit, <em>q<sub>3</sub></em>, starts in the ground state, |0>.</p>
<p dir="ltr"><strong>Operation 1.</strong> We target qubit <em>q<sub>3</sub></em> with a controlled-Hadamard operation that is controlled by the source qubit <em>q<sub>0</sub></em>. This changes the target <em>q<sub>3</sub></em> from |0> to |+x> if the source <em>q<sub>0</sub></em> is |1>. The operation looks like this on the Bloch sphere:</p>
<table border="1" dir="ltr" height="278" width="347">
<tbody>
<tr>
<td style="width: 210px;">
<p dir="ltr"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereI.png" target="_blank"><img alt="image" height="251" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereI.png" style="display: block; margin: 1em 1em 0pt 0pt; float: left;" width="284"></img></a></p>
</td>
</tr>
</tbody>
</table>
<p dir="ltr"><strong>Operation 2.</strong> Next, we target qubit <em>q<sub>3</sub></em> with a controlled-Z operation that is controlled by the source qubit <em>q<sub>1</sub></em>. This changes the phase of the target <em>q<sub>3</sub></em> by rotating <em>pi</em> radians around Z-axis if the source qubit <em>q<sub>1</sub></em> is |1>. The operation looks like this on the Bloch sphere:</p>
<table border="1" dir="ltr" style="width: 345px;">
<tbody>
<tr>
<td style="width: 333px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereII.png" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereII.png" style=" display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">The following are the results so far:</p>
<ul dir="ltr">
<li>For input <em>q<sub>1</sub></em> <em>q<sub>0</sub></em> = |0> |0>, <em>q<sub>3</sub></em> is not changed from |0></li>
<li>For input <em>q<sub>1</sub></em> <em>q<sub>0</sub></em> = |0> |1>, <em>q<sub>3</sub></em> only changed to |+x></li>
<li>For input <em>q<sub>1</sub></em> <em>q<sub>0</sub></em> = |1> |0>, <em>q<sub>3</sub></em> is at |0> because <em>q<sub>0</sub></em> did not rotate it, and <em>q<sub>1</sub></em> requests a Z-axis phase rotation, but |0> is along the Z-axis, so rotating it does nothing.</li>
<li>For input <em>q<sub>1</sub></em> <em>q<sub>0</sub></em> = |1> |1>, <em>q<sub>3</sub></em> is |-x> due to <em>pi</em> phase rotation from |+x></li>
</ul>
<p dir="ltr"><strong>Operation 3.</strong> Finally, we target <em>q<sub>3</sub></em> with a controlled-Hadamard operation that is controlled by the source qubit <em>q<sub>0</sub></em>. Note above that when input <em>q<sub>0</sub></em> is |0>, <em>q<sub>3</sub></em> is already in the correct state of |0>. Therefore, we only take a further action if <em>q<sub>0</sub></em> is |1>.</p>
<p dir="ltr">When <em>q<sub>0</sub></em> is |1>, then the controlled-Hadamard operation maps the X-axis to the Z-axis, so |+x> is converted to |+z>=|0> and |-x> is converted to |-z>=|1>. The operation looks like this on the Bloch sphere:</p>
<table border="1" dir="ltr" style="width: 350px;">
<tbody>
<tr>
<td style="width: 350px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereIII.png" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/ANDonBlochSphereIII.png" style=" display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">This concludes the method for performing 'AND' with quantum operation, which calculates the most significant bit of the single bit addition result.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[6] qc.ch(q[0], q[3])
qc.cz(q[1], q[3])
qc.ch(q[0], q[3])</span></pre>
<h2 dir="ltr">Perform the Measurement</h2>
<p dir="ltr">Use this code to measure the state of the qubits, giving a classical computing answer.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[7] qc.measure(q, c)</span></pre>
<h2 dir="ltr">Simulate the Quantum Circuit</h2>
<p dir="ltr">On a simulator, use this code to execute the quantum circuit that defines the input, performs the processing, and measures the output. Then, render the output in the notebook user interface.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[8] simulator = Aer.get_backend('qasm_simulator')
job = execute(qc, simulator)
result = job.result()
print("Data Result:", result.get_data(qc))</span></pre>
<p dir="ltr">Based on the initialization in cell [5] above, the simulator always produces the result '10' (2) in qubits <em>q<sub>3</sub></em> and <em>q<sub>2</sub></em>.</p>
<h2 dir="ltr">Run the Experiment on a Real IBM Q Quantum Computer</h2>
<p dir="ltr">Now we will set up to run on a real IBM Q quantum computer. The first cell below contains code that only has to run once per Python run-time to get it to work with your IBM Q Experience account. The second cell should be run once per notebook session to load the user's IBM Q quantum computer access token.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[9] IBMQ.save_account('</span><span style="color:#8B4513;"><em>your API token</em></span><span style="color:#0000FF;">')
[10] IBMQ.load_accounts()</span></pre>
<p dir="ltr">Now this cell obtains access to a real IBM Q quantum computer on which to compile and execute the code.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[11] result_counts = {}
try:
device = IBMQ.get_backend('ibmqx4')
job_exp = execute(qc, device, shots=1024, max_credits=10)
result_exp = job_exp.result()
result_counts = result_exp.get_counts(qc)
except:
print("Device is currently unavailable.")</span></pre>
<p dir="ltr">In the results, the qubits are in the order <em>q<sub>3</sub></em>, <em>q<sub>2</sub></em>, <em>q<sub>1</sub></em>, and <em>q<sub>0</sub></em>, so we tally the outcomes based on the first two qubits as they are the output qubits.</p>
<pre dir="ltr">
<span style="color:#0000FF;">[12] result_freqs = {'00':0, '01':0, '10':0, '11':0 }
for key in result_counts.keys():
freq_key = key[0:2]
result_freqs[freq_key] = result_freqs[freq_key] + result_counts[key]</span>
<span style="color:#0000FF;"> import matplotlib.pyplot as plt
D = result_freqs
plt.bar(range(len(D)), list(D.values()), align='center')
plt.xticks(range(len(D)), list(D.keys()))
plt.show()</span></pre>
<p dir="ltr">There are four possible outcomes for the two output qubits: 00, 01, 10, and 11. The expected outcome is 10. With 1024 shots, the noise outcome would be on the order of 256 shots per possible outcome. In the frequency results shown below, one can see that, by far, the correct final state of the output qubits occurs more frequently than all other possible outcomes combined. A most frequently occurring quantum computing outcome is precisely what a quantum experimentalist would investigate first within their real world application.</p>
<table border="1" dir="ltr" height="272" width="396">
<tbody>
<tr>
<td style="width: 500px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/QuantumAdditionBarChart.png" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/QuantumAdditionBarChart.png" style=" display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<h2 dir="ltr">Conclusion</h2>
<p dir="ltr">In this notebook, we have demonstrated how quantum algorithms use quantum operations to coerce qubits into representing the outcome or outcomes that satisfy the constraints of a problem. In the case of quantum addition of two qubits initialized with classical bit values, one output qubit had to satisfy the constraint of being excited if and only if the two input qubits differed, and a second output qubit had to satisfy the constraint of being excited if and only if both input qubits were excited. Not only did we simulate this quantum circuit, we ran it on a real IBM Q quantum computer. When we did, we witnessed the fact that in the NISQ era, one plus one is most probably two!</p>
<p dir="ltr">Finally, note that the quantum logical AND method we built above is also signicant because one can append an X gate, which performs a logical NOT, resulting in a NAND operation. In classical cmputing, the NAND operation is a universal gate that can be used to build all other classical computing circuits. <strong>Therefore, any classical computing solution can be expressed... and we have only used 4 points of the Bloch sphere representing the total expressive power available to each qubit of a quantum computer.</strong></p>
<h3 dir="ltr">Related Links</h3>
<p dir="ltr"><a href="https://qiskit.org/" target="_blank">Qiskit</a>. An open-source quantum computing framework for leveraging today's quantum processors in research, education, and business.</p>
<h3 dir="ltr">Acknowledgements</h3>
<p dir="ltr">The author gratefully acknowledges the thorough reviews and feedback by Luuk Ament and Robert Loredo.</p>Introduction The purpose of this notebook article is to demonstrate using quantum operations to achieve computing objectives. The term quantum operation refers to a higher level method that may be implemented with one or many low-level quantum gates. More...002376urn:lsid:ibm.com:blogs:entry-eb9b0b2f-f67f-4c39-9afe-f266e4641918Data Analysis for Improving Machine Learning ModelsJohn M. Boyer060000VMNYactivefalseJohn M. Boyer060000VMNYactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2018-05-28T17:15:26-04:002018-05-28T17:17:06-04:00<p dir="ltr">In a <a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/entry/Measuring_the_Quality_of_a_TensorFlow_Regression_Model" target="_blank">prior article</a>, we got a pretty good R squared quality metric for predicting house values with a linear regression model that we trained in an <a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/entry/An_IBM_Data_Science_Experience_with_TensorFlow" target="_blank">earlier article</a>. However, there still may be a lot of room for training a higher quality predictive model, even using the same housing data. One way to explore whether, or to what extent, this may be true is by visually analyzing the data set. Specifically, the variable being predicted (house value) will be examined against each predictor variable in isolation to see if any patterns stand out. This is especially important because the machine learning algorithm was <em>linear</em> regression, so if a clear <em>non-linear</em> data relations exist, then we will know there is room for creating an improved model that gets a higher R squared value.</p>
<p dir="ltr">As a prerequisite to doing data analysis on the housing data, we first read it in, and then we import a couple of libraries that will enable us to look at scatterplots of the dependent variable (house price) against each predictor variable:</p>
<pre dir="ltr">
<span style="color:#0000FF;">import pandas as pd
df_data_1 = pd.read_csv('cal_housing_data with headers.csv')
import numpy as np
import matplotlib.pyplot as plt</span></pre>
<p dir="ltr">Next we put the data into a numpy array and then isolate the data for the dependent variable into vector <em>y</em>:</p>
<pre dir="ltr">
<span style="color:#0000FF;">data = np.array([x for x in df_data_1.values])
y = np.delete(data, slice(0, 8), axis=1)</span></pre>
<p dir="ltr">Then, to get a scatterplot for any predictor variable <em>x</em>, we can use code like that which is below to plot that variable's data against the dependent variable <em>y</em>:</p>
<pre dir="ltr">
<span style="color:#0000FF;">x = np.delete(data, slice(1, 9), axis=1)
color = "#0000BF"
plt.scatter(x, y, c=color, s=1)
plt.title('Longitude vs. House Price')
plt.xlabel('Longitude')
plt.ylabel('House Price')
plt.show()</span></pre>
<p dir="ltr">Since <a href="https://github.com/john-boyer-phd/TensorFlow-Samples/blob/master/Regression/Housing%20Data%20Visual%20Analysis.ipynb" target="_blank">the code</a> is just one click away, I won't keep repeating minor variations of it for the other predictor variables. Instead, we'll analyze the data patterns now.</p>
<p dir="ltr">In the plot of house longitude versus price below, there is clearly a pattern to the data, and it is also not linear. In other words, a linear curve may be somewhat helpful (after all, we did get a decent R squared overall), but the pattern for this predictor variable is more reminiscent of a quadratic (parabolic) curve or an inverted quartic curve, which is a degree four polynomial with two concave down humps. Although it is not often immediately evident why a particular pattern in data exists, in this case, it's fairly obvious that the two price 'humps' correspond to high value properties in the Los Angeles and Silicon Valley areas.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/1LongitudevsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/1LongitudevsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">When we examine the scatterplot below of latitude versus house prices, we see a similar looking pattern of two humps. This is because scanning northward toward increasing latitude also hits the same regions where house prices are highest. If you look a little closer, you can even see a more intricate pattern involving higher prices around San Diego, then Los Angeles, then tapering off until the San Jose / San Francisco area, then tapering off except more slowly because of places like Sacramento. Who knows how intricate a pattern we might discern if we look hard enough with our neural networks?</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/2LatitudevsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/2LatitudevsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">At the opposite end of the recognizable pattern spectrum, we have below the plot of house median age versus house price. The best we can say is that it looks like a hot mess. The linear regression model may be getting a tiny bit of R squared mileage out of a line, but adding virtually any variable can slightly boost R squared without really capturing any kind of useful relationship to the dependent variable. When a variable shows a plot that looks this much like randomness, it's worth testing whether it would be better to just leave it out and save compute resources for processing better predictors.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/3MedianAgevsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/3MedianAgevsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">The plot below for the total rooms square footage against the house prices shows an interesting and different pattern. It's quite reasonable to see the pattern as a steeply sloped line and hence that linear regression would be appropriate. And it's not that surprising a pattern: more living space, higher cost, simple as that. However, it may be possible to get increase R squared with, say, a thin concave down parabola. Experimentation would be the only way to tell.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/4TotalRoomsvsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/4TotalRoomsvsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">Also not surprisingly, there is a similar pattern in the scatterplot below comparing total bedroom square footage and (column 4) and housing prices. It's remotely possible, in this case, that a cubic relationship could do slightly better than a parabola, but more time-consuming experimentation would be needed to test this possibility. If only there were a way to automate the testing for such patterns... :-)</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/5TotalBedroomsvsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/5TotalBedroomsvsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">Although for different reasons, there is a similar looking pattern between population density and house prices in the plot below. It's easy to see a line with a sharp upward slope being reasonably reflective of this data just based on supply and demand, but again there are nuances suggesting that a possible quadratic or cubic curve might be a somewhat better fit. Still, it's not a high priority to do manual work to find a better fit when you see a pattern like this.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/6PopulationvsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/6PopulationvsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">As another measure of population density, the density of households shows a similar pattern with house prices in the plot below, so again, linear regression is a good model.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/7HouseholdsvsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/7HouseholdsvsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">The scatterplot for median income value relative to house price shows the nicest example of a linear relationship. One might easily assume that there would be a linearly increasing trend between earning more money and buying a more expensive house, but it's still best to look at the data to make sure it matches your assumptions. And still, as with any variable, it may be possible to do better with a polynomial, such as one that produces slight bends in the line. But if you have to do it manually, it would be the lowest priority when you see such a clearly linear pattern as this.</p>
<table border="1" dir="ltr" style="width: 410px;">
<tbody>
<tr>
<td style="width: 410px;"><a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/8MedianIncomevsHousePrice.jpg" target="_blank"><img alt="image" src="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/resource/BLOGS_UPLOADED_IMAGES/8MedianIncomevsHousePrice.jpg" style=" width:400px; display:block; margin: 1em 1em 0pt 0pt; float: left;"></img></a></td>
</tr>
</tbody>
</table>
<p dir="ltr">In summary, we've now seen that a number of the variables perform reasonably well against the assumption of linearity, and that helps understand why the linear regression model had a good R squared metric. And yet, we've also seen that some variables, especially the longitude and latitude, have clearly non-linear patterns, which suggests there is a better predictive model out there. In the upcoming work, we'll explore how to build it... stay tuned!</p>In a prior article , we got a pretty good R squared quality metric for predicting house values with a linear regression model that we trained in an earlier article . However, there still may be a lot of room for training a higher quality predictive...007630urn:lsid:ibm.com:blogs:entry-7c79a551-9085-4a41-af8a-eee6f20ce5f0Incremental Training of a TensorFlow Neural NetworkJohn M. Boyer060000VMNYactivefalseJohn M. Boyer060000VMNYactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2018-04-27T15:11:34-04:002018-04-27T15:15:20-04:00<p dir="ltr">Training a neural network typically involves using many epochs, each of which exposes the neural network to the full training data set, before the accuracy is no longer appreciably affected. For a lengthy overall training, it’s useful to save the training progress so that if there is an interruption for some reason, then training can be resumed rather than restarted.</p>
<p dir="ltr">After any epoch of training, the state of training can be saved using essentially the same technique as we did for saving a fully trained model and restoring it in a production environment for inferencing. To simulate a training interruption in my Bankloan sample, I broke the 3000 epoch training cell into two cells. The first training cell had the same training code except for stopping at 1500 epochs and for saving the progress at every 500 epochs using the following:</p>
<pre dir="ltr">
<span style="color:#0000FF;">if (epoch % 500) == 499:
save_path = saver.save(training_session,
"../datasets/Neural Net2/Neural Net.ckpt",
<strong>global_step=epoch+1</strong>)
print(epoch+1, " training progress saved to ", save_path)</span></pre>
<p dir="ltr">The next cell resumes the training with the same training code, except for only running the latter 1500 epochs after restoring the state of TensorFlow using the following code:</p>
<pre dir="ltr">
<span style="color:#0000FF;">with tf_training2.Session() as training2_session:</span>
<span style="color:#0000FF;"> inf_saver = tf_training2.train.import_meta_graph(
'../datasets/Neural Net2/<strong>Neural Net.ckpt-1500.meta</strong>')</span>
<span style="color:#0000FF;"> inf_saver.restore(training2_session,
tf_training2.train.latest_checkpoint('../datasets/Neural Net2/'))</span>
<span style="color:#0000FF;"> </span>
<span style="color:#0000FF;"> graph = tf_training2.get_default_graph() </span>
<span style="color:#0000FF;"><strong> training2_op = graph.get_operation_by_name("train/GradientDescent")</strong></span>
<span style="color:#0000FF;"> X2 = graph.get_tensor_by_name("X:0")</span>
<span style="color:#0000FF;"><strong> y2 = graph.get_tensor_by_name("y:0")</strong></span>
<span style="color:#0000FF;"><strong> accuracy2 = graph.get_tensor_by_name("test/accuracy:0")</strong></span>
<span style="color:#0000FF;"> outputs2 = graph.get_tensor_by_name("nn/nn_output:0")</span></pre>
<p dir="ltr">Relative to restoring a model for the purpose of inference, there are only a few small differences. First, the name of the file from which we read includes the epoch number. Second, we have to get values for a few more Python variables, like the <em><span style="font-family:Courier New, Courier, monospace">y</span></em> layer tensor to which we feed the correct output values during training, the training operation itself, and the accuracy testing tensor. Third, getting the training operation requires a slightly different call because it is an operation rather than a tensor. Fourth and last, getting the accuracy tensor requires that we give it a name during construction of the tensor flow because it cannot be extracted during restoration unless it has a name that was saved. This can be accomplished by simply adding an identity node to the front of the previous accuracy tensor, like this:</p>
<pre dir="ltr">
<span style="color:#0000FF;">with tf.name_scope("test"):
correct = tf.nn.in_top_k(tf.cast(outputs, tf.float32), y, 1)
accuracy = <strong>tf.identity</strong>(
tf.reduce_mean(tf.cast(correct, tf.float32)), <strong>"accuracy"</strong>) </span></pre>
<p dir="ltr">This gave the accuracy tensor the name ‘accuracy’ within the ‘test’ namescope. Reviewing the code, one may note that the training operation is not explicitly given a name. However, this is because the TensorFlow library itself assigns a default name of ‘GradientDescent’ to the operation during creation, which occurred within the ‘train’ namescope.</p>
<p dir="ltr">Speaking of the code, you can <a href="https://github.com/john-boyer-phd/TensorFlow-Samples/blob/master/Neural%20Net/Bankloan%20Split%20Training.ipynb" target="_blank">go here to download a copy of the notebook</a>. Instead of one cell for all the training, the training is split into two cells, with the latter cell [10] reloading and resuming where the former cell [9] left off. Finally, note that it is possible to fully simulate interrupted training by stopping the Python kernel after the first half of training. Once you restart the kernel, simply rerun cells [1] to [4] to reload the training data, and then run the second half of training starting at cell [10]. The only difference will be a negligibly different accuracy result, relative to training all epochs with the same kernel, due only to the random number seed being regenerated when the Python kernel is restarted.</p>
<p dir="ltr"> </p>
<p dir="ltr"> </p>Training a neural network typically involves using many epochs, each of which exposes the neural network to the full training data set, before the accuracy is no longer appreciably affected. For a lengthy overall training, it’s useful to save the...0011466urn:lsid:ibm.com:blogs:entry-d7417b34-b21e-4ec6-b264-a441fb5cd806Measuring the Quality of a TensorFlow Regression ModelJohn M. Boyer060000VMNYactivefalseJohn M. Boyer060000VMNYactivefalseComment Entriesapplication/atom+xml;type=entryLikestrue2018-03-20T20:23:32-04:002018-03-20T20:35:43-04:00<p dir="ltr">In this article, we’ll cover how to measure the quality of the TensorFlow regression model covered in a <a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/entry/An_IBM_Data_Science_Experience_with_TensorFlow?lang=en">prior post</a>. As usual, the code for the quality measurements can be obtained from my <a href="https://github.com/john-boyer-phd/TensorFlow-Samples">TensorFlow Samples repository</a>, and you can use this code in <strong>IBM Data Science Experience / Watson Studio</strong>. The code is also written generically so that you can apply it to models built with other libraries, too.</p>
<p dir="ltr">A regression model solves a kind of problem that can’t be solved with a classification algorithm. A data scientist trains and uses a regression model when the variable being predicted is a continuous quantity or an ordinal quantity with a large value space. For example, if the input is an image of one of ten numeric digits, a classification model would predict which digit it is. Even though numbers are comparable, there’s nothing about an image of a two that makes the image less than an image of a three, any more than an image of a cat would be less than an image of a dog (as if!). On the other hand, a regression model would be used to predict a property value (essentially continuous) or the number of hours after a medical procedure that a patient will need to stay in an intensive care unit (ordinal with a high value space).</p>
<p dir="ltr">The linear regression model in the <a href="https://www.ibm.com/developerworks/community/blogs/JohnBoyer/entry/An_IBM_Data_Science_Experience_with_TensorFlow?lang=en">prior post</a> was a linear regression model that used matrix operations to determine a line of ‘best fit’ for the housing data. There were 9 variables including the median house value that the linear regression model learned how to predict. So, the ‘best fit’ line is calculated to flow through 9-dimensional space in a way that is closest, overall, to all the 9-dimensional data points in the housing data.</p>
<p dir="ltr">But how good is the fit of that ‘best fit’ line? Sometimes the ‘best fit’ line not a good fit because variables are not linearly related to the dependent variable. At other times, there might be a linear relationship at a statistically significant level, but the model is still not that great of a fit because the relationship, and hence the data, is noisy. So, how do we measure whether we have a good regression model, an excellent one, or a poor one?</p>
<p dir="ltr">The R squared metric is a ratio that indicates the amount of the data’s variance from the mean that is accounted for by the regression model’s predictions. Before we unpack the meaning of that statement, let’s just first have a look at the <strong>library method you’d normally use</strong> to get the measurement. The variable ‘predicted_values’ contains a one-dimensional array of predicted median house values generated using the trained linear regression model. To prepare for the R squared calculation, we flatten the actual median house prices into the same one-dimensional format, and then we use the scikit learn method that calculates R squared for us:</p>
<pre dir="ltr">
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">y_actual = np.ndarray.flatten(housing_target)</span></span>
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">from sklearn.metrics import r2_score</span></span>
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">R2 = r2_score(y_actual, predicted_values)</span></span></pre>
<p dir="ltr">The result in this case is a touch more than 0.637. One may have a rough sense that this is good because, well, more than half of the variance from the mean is explained or accounted for by the regression model. In other words, if you were given each house’s predictor variable values and you always answered the average price, then your answers would reflect a balance between sometimes be high and sometimes low. The <strong>total variance</strong> of the actual house prices from your constantly mean answers (yes, I meant that) is called the total sum of squares, and you can calculate it yourself very easily like this:</p>
<pre dir="ltr">
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">y_bar = np.mean(y_actual)</span>
<span style="font-family: Courier New,Courier,monospace;">SStot = 0.0</span>
<span style="font-family: Courier New,Courier,monospace;">for y_i in y_actual:</span>
<span style="font-family: Courier New,Courier,monospace;"> diff = float(y_i - y_bar)</span>
<span style="font-family: Courier New,Courier,monospace;"> SStot += (diff * diff)</span></span></pre>
<p dir="ltr">On the other hand, <strong>residual variance</strong> is the variance that is unexplained or not accounted for by the regression model. In other words, it is the variance that’s left over if you use the regression model to predicted values instead of using the mean. It is easily computed like this:</p>
<pre dir="ltr">
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">SSres = 0.0</span>
<span style="font-family: Courier New,Courier,monospace;">for i, f_i in enumerate(predicted_values):</span>
<span style="font-family: Courier New,Courier,monospace;"> diff = float(f_i - y_actual[i])</span>
<span style="font-family: Courier New,Courier,monospace;"> SSres += (diff * diff)</span></span></pre>
<p dir="ltr">The ratio of the residual to total variance is the portion of unexplained error, and subtracting that from 1 gives the portion of variance explained by the regression model, which is R squared and is calculated easily as follows:</p>
<pre dir="ltr">
<span style="color:#0000FF;"><span style="font-family: Courier New,Courier,monospace;">R_squared = 1.0 - SSres / SStot</span></span></pre>
<p dir="ltr">The following illustration graphically depicts the difference that a linear regression model makes in accounting for variance. On the left, you see the results of constantly using the mean as the predicted value. Each data point is some distance from the mean line, and the square of that distance is the variance for that data point. The sum of the large reddish squares’ areas gives the <strong>total variance</strong> from the actual data values. On the right, you can see smaller blue squares of <strong>residual variance</strong> of the actual data points from the predicted values of the linear regression model.</p>
<p dir="ltr"><a href="https://commons.wikimedia.org/wiki/File%3ACoefficient_of_Determination.svg" title="By Orzetto (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons"><img alt="Coefficient of Determination" src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Coefficient_of_Determination.svg/512px-Coefficient_of_Determination.svg.png" width="512"></img></a></p>
<p dir="ltr">Now you have a rough intuitive sense that the housing price model was a good model based on an R squared of 0.637. But, the precision of the intuition is like a rare steak. It tastes good, but we all know that data scientists are people, and people shouldn’t eat undercooked meat.</p>
<p dir="ltr">So, what is good, fair, poor, or excellent for R Squared? A number of sources that you will find out there will say that an R squared of 0.25 is a large effect size. However, this is large for detecting the effect of a treatment (e.g. a psychological technique, educational module, or medication). <strong>But a good R squared for a treatment's effect size is different from (and less than) the R squared that would correspond to a good predictive model.</strong></p>
<p dir="ltr">In a 2015 study, a group of medical researchers created a new regression model for predicting the required length of stay in intensive care after heart surgery. The benchmark model in use at the time had an R squared of 0.356. This is consistent with answers I received while interviewing a few data scientists, who indicated that R squared values in the 0.3’s and 0.4’s would correspond to serviceable predictive models. Since they also said they’d want to keep experimenting to get better results, it would be fair to say that 0.3’s and 0.4’s are ‘fair’ values for R squared for a predictive model.</p>
<p dir="ltr">The purpose of the 2015 study, though, was to present the researchers’ new regression model, which had a much-improved R squared of 0.535. The “delighted tone” (Lewis, 2016, p. 79) the researchers had when describing the new model was due to the magnitude of improvement in R squared, but in that case it’s reasonable to conclude that the new R squared should be described with a qualitatively higher qualifier. As such, it is a ‘good’ R squared value. More generally. 0.5’s and 0.6’s would be considered ‘good’ to ‘quite good’ according to the data scientists I interviewed. </p>
<p dir="ltr">When asked ‘what is a good R squared,’ the data scientists I interviewed did, of course, start with admittedly reasonable disclaimers like “It depends on what you’re doing” and “it depends on the current benchmark.” But, the characterizations above and next are based on not having answers for those dependencies. R squared values in the 0.7’s were generally regarded as excellent, and the 0.8’s were outstanding. This left the 0.9’s in the realm of practically unachievable. Put another way, in real-world scenarios, it’ll be practically as rare as is the frequency that one should eat undercooked meat.</p>
<p dir="ltr"><strong>References</strong></p>
<p dir="ltr">Lewis, N.D., 2016. <em>Deep Learning Step by Step with Python</em>. <a href="http://www.AusCov.com" target="_blank">www.AusCov.com</a></p>In this article, we’ll cover how to measure the quality of the TensorFlow regression model covered in a prior post . As usual, the code for the quality measurements can be obtained from my TensorFlow Samples repository , and you can use this code in IBM...009092