Manipulate CSV data with Python and pureXML
Import and convert US census data to XML using Python in IBM DB2 Express-C
Before you start
This tutorial is geared towards developers who want to learn how to store data in XML format in a database, connect to DB2 from a Python application, and learn how to convert data from CSV files into XML documents. No prior knowledge of Python is assumed (you will learn how to install it in this tutorial), but it would be advantageous. This tutorial assumes that you use a Microsoft® Windows® operating system, but the code should work on other platforms without modification. When you complete this tutorial, you will have the skills to create powerful Python applications that can communicate and interact with an IBM DB2 database and harness the power that pureXML offers.
About this tutorial
The IBM DB2 database management system has long been a leading player in the area of relational data management. In recent years, however, there has been a significant rise in the requirement for data structures that are more flexible and document-oriented in nature. One of the more prominent examples of such data structure is XML.
While many relational database systems have rushed to incorporate some form of XML support in their database, IBM DB2 is the only such offering that allows XML to be stored natively in the database, unchanged and true to its original form. This is referred to as pureXML—a technology that allows DB2 developers and DBAs to manipulate and report on XML data alongside relational data, without negatively affecting the purity of the XML itself.
In this tutorial, you will develop a Python script that connects to the United States Census Bureau Web site and downloads a CSV file containing data about the population at a national, regional, and state-wide level—including the results of the 2000 Census and fluctuations based on estimates in each year since then. You will then learn how to process this data, converting it into an XML document. Rather than import this large document and rely on DB2 functions to slice and dice it into individual rows, you will then use Python to insert this data into DB2, with an XML document stored per each relevant row in the CSV file. Finally, you will create a command-line application that produces some useful reports on this data, showing a list of states, regions, or countries in the order of highest to lowest population.
To follow the steps in this tutorial, you will need to have the following software installed:
- IBM DB2 Express-C 9.5 or later
- Python Version 2.6 or any pre-3.0 version
See Related topics for the links to download these prerequisites. This tutorial assumes that you are using a Microsoft Windows operating system, preferably XP or later. In order to install Python and the IBM DB2 extension for Python, you will need administrative privileges on your computer.
Setting up the database
In this section, you will create a new IBM DB2 database using the DB2 Command Editor utility, before you create a series of tables that will store census population data in XML format. You will create three tables: country, region, and state. Each of these tables will store a unique ID for each row in the table, as well as an XML document that will house the census data that you will import from the U.S. Census Bureau's CSV files later in this tutorial.
Creating the database
Let's get started. Open the DB2 Command Editor (Start>Programs>IBM
DB2>[DB2 Instance Name]>Command Line Tools) and enter the
create database census using codeset UTF-8 territory
This process can take a minute or two to complete so be patient. When it is finished,
you will receive a response message like:
DB20000I The CREATE DATABASE command
Tip: You can quickly execute commands in Command Editor by pressing Ctrl+Enter.
Now, connect to the newly-created census database using the following:
Once again, you should receive a response from the DB2 server, this time something
along the lines of:
A JDBC connection to the target has succeeded.
The database is now created and you are ready to create the tables that will store the application's data.
Creating the database tables
You will load the population data into the database and store it in three separate tables: country, region, and state. Let's create these tables now in Listing 1.
Listing 1. DDL SQL statements to create tables
create table country ( id int not null generated by default as identity, data xml not null, primary key(id) ); create table region ( id int not null generated by default as identity, data xml not null, primary key(id) ); create table state ( id int not null generated by default as identity, data xml not null, primary key(id) );
Each of these tables store the same type of data—a unique identifier that will be automatically generated by DB2 each time a row is inserted, and an XML data column that will store an XML document for each row. Strictly speaking, you can use a single table here and create a type column on it to determine whether a row is a country, region, or state, but if you separate them into tables it you have more flexibility for future manipulation.
When you execute the above SQL statements, DB2 should return the following response
for each table:
DB20000I. The SQL command completed successfully.
With the database configured, you are now ready to install and configure Python and the ibm_db extension for Python.
Installing and configuring Python
Python is a high-level programming language that places a strong focus on the readability of code. Unlike many other programming languages, where code indentation and style is at the discretion of the developer, in Python you must use indentation to denote blocks of code (such as classes, if statements, and loops). Python is easy to learn, produces elegant and clean code, and is widely supported on a host of different platforms, making it an excellent choice for any number of different application development projects.
Although Python is generally pre-installed on Mac OS X and Linux® operating systems, the same cannot be said for Microsoft Windows. Fortunately, you can download Python from the Web and install it on Windows—and you will learn how to do so in this section. Before you start, however, it's worth mentioning that you have a number of options when it comes to downloading Python for Windows.
The first option is to use the open-source official binary installer, available for download from the official Python Web site. This option offers the most up-to-date version of Python and is provided on an open-source license. In this tutorial, you will work with this distribution of Python.
Alternatively, the commercial ActiveState Python offers some additional resources such as complete documentation, and additional Python extensions including Windows-specific extensions that facilitate the development of Win32 API-based applications using Python.
The first step in installing Python is to download it from the official Python Web site (see Related topics for a link). At the time of writing, the current production versions of Python are 2.6.4 and 3.1.1. This tutorial assumes you are using the 2.6.* version of Python. As version 3.0 and above is not backward-compatible, I highly recommend that you download the latest pre-3.0 version (version 2.x.x) offered for download. Save this file to your hard drive and when it has finished downloading, open the .msi file to launch the setup program.
When the installer launches, it will ask you if you want to install it for all users or just for you (this option is not available on Windows Vista®). Leave the default selection, Install for all users, and press Next to continue. You will now be asked to select a destination directory. The default should be C:\Python26\ or similar; again, accept this default and press Next to move forward. You will now be offered the opportunity to customize your Python installation by selecting which features you want to be installed. By default, everything is selected, so leave this as is and press Next to start the installation. The process will take a couple of minutes to complete, and when it is finished, you will see a window like the one in Figure 1.
Figure 1. Completing the Python 2.6.4 Installer window
Press Finish to exit the setup application. Before you move on, it's worth verifying that Python is installed and working correctly. You can use the shortcuts that were added to the Windows Start Menu if you wish, but I recommend that you launch Python from the command prompt as this is how you will run the scripts you create later in this tutorial.
First, open the Windows command prompt window through the Run dialog
(Start>Run, then enter
cmd) or navigate to
Start>Programs>Accessories>Command Prompt). At the
prompt, enter the command:
Figure 2. Python prompt
Note: If you see a message such as
python is not recognized as an internal or
external command, operable program or batch file, the Python directory
was not placed on your Windows Path. See Related
topics for information on how to set this up. To quit the Python prompt,
enter the following command:
You should return to the Windows command prompt after entering this command at the Python prompt. In the next section, you will learn how to install the ibm_db Python extension, which will allow you to connect to a DB2 database from Python.
Installing the ibm_db Python extension
The ibm_db extension for Python allows you to connect to and interact with an IBM DB2 database using Python code. To install this extension, you will first need to install the easy_install utility (setuptools). Navigate to the setuptools package page (see Related topics and find the file for your version of Python, which is 2.6 in my case). Download this file to your hard drive, and when it has completed, open it to install the easy_install.exe application into your Python Scripts directory (usually C:\Python26\Scripts).
Installing the ibm_db extension itself is very simple. Open a Windows command prompt
window (Start>Run>cmd) and enter the following command, and
if you installed Python to a different directory, change the reference accordingly:
This will search for, download, extract, and install the ibm_db extension automatically. When it is finished, you will be returned to the Windows command prompt, as in Figure 3. (See a text-only view of Figure 3.)
Figure 3. Successfully installed ibm_db extension
Next, you will verify that the ibm_db extension is working correctly by testing the connection to the DB2 database you created earlier in this tutorial.
Connecting to DB2 from Python
With the DB2 database created and Python and the ibm_db extension installed and configured, you are now ready to check that you can connect to DB2 from Python. Open a Windows command prompt window and issue the python command to launch the Python interpreter.
At the prompt, enter the following commands to connect to DB2 and count the number of rows in the country table. Please note that the Python prompt (>>> and ...) has been included here for illustrative purposes only and you should not type those in to the interpreter. Also, be sure to replace the credentials in the code in Listing 2 with your actual DB2 credentials.
Listing 2. Python code to connect to DB2
>>> import ibm_db >>> conn = ibm_db.connect("DATABASE=census;HOSTNAME=localhost;PORT=50000; PROTOCOL=TCPIP;UID=username;PWD=password;", "", "") >>> sql = "SELECT COUNT(*) FROM country" >>> stmt = ibm_db.exec_immediate(conn, sql) >>> result = ibm_db.fetch_both(stmt) >>> while result != False: ... print "Count: ", result ... result = ibm_db.fetch_both(stmt) ...
After you enter the final line above, press enter and the code will execute. You
should see a result (
Count: 0) as in Figure 4.
Figure 4. Result of DB2 connection test
If you have problems connecting to DB2 from Python, check that the ibm_db extension was installed correctly, and that you have already created the database as described earlier in this tutorial. Also verify that your credentials for connecting to DB2 are correct.
With the database set up and Python ready to get to work, you are now ready to start developing the main subject of this tutorial. In the next section, you will download, parse, and convert CSV data from the U.S. Census Bureau and save it as XML data in the DB2 database. You will then learn how to read this data from the database and display it to the user.
Downloading and converting the CSV data
In this section of the tutorial, you will learn how to create a Python script to pull a CSV file down from the United States Census Bureau Web site. Next, you will process this CSV data and convert it to XML that can be stored in the DB2 pureXML database you created earlier in the tutorial.
Before you start, you should create a folder somewhere on your hard disk where you will store the project files. I stored my data in a folder C:\pycensus, and I suggest that you do the same.
Downloading CSV files from the US Census Bureau Web site
The United States Census Bureau has a plethora of data available for download, in a variety of different formats. Unfortunately, the population data from Census 2000 and estimates for each year since then is only available in CSV format and not XML. That's not a problem, however, as you can use Python to pull this CSV file down from the Census Bureau Web site and convert it into XML that you can store in your DB2 pureXML database.
If you wanted, you can point your Web browser to the URL for the CSV file project folder. Instead, however, you will create a Python script to do this task. In your favorite text editor, create a new file and save it as download.py in your project folder (for example, C:\pycensus). Add the code from Listing 3 to this file.
Listing 3. download.py
import httplib conn = httplib.HTTPConnection("www.census.gov") conn.request("GET", "/popest/national/files/NST-EST2008-alldata.csv") response = conn.getresponse() f = open('data.csv', 'w') f.write(response.read()) f.close() conn.close()
In this script, you use the
httplib module to connect to the census.gov
Web site and issue a
GET request for the CSV file required. Then you
fetch the response and write it to a file named data.csv. To run this script, open
up the Windows Command Prompt and change to the project directory as follows:
Next, run the following command to run the Python script:
When the script has completed you will return to the prompt. You might wonder why there were no messages produced—don't worry, this is a good thing as it means no errors occurred. Open your project folder in Windows Explorer and you will now notice an extra file in the folder named data.csv. If you have Microsoft Excel®, this will be the default handler for this type of file, and opening it will produce a result like the one in Figure 5.
Figure 5. data.csv in Microsoft Excel
Warning: Do NOT save this file in Excel, as it may change the CSV file format to suit its own interpretation, and this might not be readable by Python. If Excel asks you to save the file, choose No. If you accidentally save the file, simply delete it and re-run the download.py Python script. In the next section, you will learn how to take this CSV file and convert it into XML.
Converting CSV data into XML documents
To convert the CSV data into XML, you must first be clear on how exactly you wish to store the data, whether different records should be stored differently, and check if some records should be discarded. In the example of the CSV file you just downloaded, you will notice that this contains three types of data: a single row of information for the entire country; four rows of data for the regions Northeast, Midwest, South, and West; fifty one rows of data for the fifty states of the USA plus District of Columbia; and a row for the Puerto Rico Commonwealth. The first row of the file is a header row that is to be used for column names.
The script you create in this section will take the header row and use this data to form the tag names for each element that a record should have in the XML document. The script will determine, based on the first four columns, whether the particular row refers to a country, region, or state, and will set the tag name accordingly to indicate what the XML document refers to. Finally, the script will choose to exclude the Puerto Rico Commonwealth record as it has some incomplete data.
In your text editor, create a new file and save it as convert.py. Add the code from Listing 4 to this file.
Listing 4. convert.py
import csv reader = csv.reader(open('data.csv'), delimiter=',', quoting=csv.QUOTE_NONE) print "<data>" for record in reader: if reader.line_num == 1: header = record else: innerXml = "" dontShow = False type = "" for i, field in enumerate(record): innerXml += "<%s>" % header[i].lower() + field + "</%s>" % header[i].lower() if i == 1 and field == "0": type = "country" elif type == "" and i == 3 and field == "0": type = "region" elif type == "" and i == 3 and field != "0": type = "state" if i == 1 and field == "X": dontShow = True if dontShow == False: xml = "<%s>" % type xml += innerXml xml += "</%s>" % type print xml print "</data>"
In this file, you use the csv library to read the data.csv file. You wrap the output
in an opening
<data> and closing
tag, as it is producing a single document output. Then you loop through each line of
the CSV file. If the current line is the first line of the file, you set that record
as the header. This will be used later in the script as the element name for each
field in a country, region, or state record. If the current line is not the header
record, you loop through each column in the record, and create an inner XML element
string whose name is driven from the heading record. You then check whether the row
in question is referring to a country, region, or state, and wrap the inner XML
element in an outer tag
<state> accordingly. Finally, you check if the record contains
X in a specific field, and if so, set a Boolean indicator to
True that will stop that particular row from being added to the XML
document. The first way you can run this script is the same as before, by issuing:
Running the script this way will produce a result like the one in Figure 6.
Figure 6. convert.py output
As you can see, the script has put the data directly to the screen. It would be far
more useful if this data were saved to a file. Rather than creating more Python code
to do this, you can simply change the command you issue as follows to tell the
command prompt to save the output to a file named data.xml:
This will create a new file in the project directory named data.xml. If you open this file in an application that reads and formats XML, such as Firefox, you might see a result like the one in Figure 7.
Figure 7. XML output in Mozilla Firefox
With the data stored in a file like this, you can import the XML into a DB2 database
using a .del file and the
IMPORT command. However, this results in the
entire XML data being stored in a single row in a DB2 table. Now, it is possible to
use XQuery to split up this data and store it into separate rows. But, as you are
already using Python to create the XML document, it is much easier to simply perform
a series of
INSERT statements directly in the convert.py script itself.
In the next section, you will modify the convert.py script to do just that.
Saving XML into DB2 with Python
Previously, you learned how to format the CSV data you downloaded from the U.S. Census Bureau into a large XML document. Now you will learn how to take the rows for country, regions, and states and insert them into a DB2 database. Make the changes listed in this section to the convert.py file you created in the previous section.
Including the ibm_db library
The first thing that you need to do is include the ibm_db library in your code. To do
this, change the first line of the covert.py file so that it now reads:
With a script like this, running it multiple times causes every row to be inserted
repeatedly, resulting in a mass of duplicate data. To prevent that, clear the
database tables at the start of the script so that each time it runs it will start
from fresh. Add Listing 5 just below the import statement from Listing 4, which you just modified (in other words add it before
reader = csv.reader... line in Listing 4).
Listing 5. Excerpt from convert.py—clearing down tables
connString = "DATABASE=census;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP; UID=username;PWD=password;" try: conn = ibm_db.connect(connString, "", "") except: print "Could not connect to DB2: ", ibm_db.conn_errormsg() else: print "Connected to DB2." sql = "DELETE FROM country" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from country table: ", ibm_db.stmt_errormsg() else: print "Country table emptied." sql = "DELETE FROM region" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from region table: ", ibm_db.stmt_errormsg() else: print "Region table emptied." sql = "DELETE FROM state" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from state table: ", ibm_db.stmt_errormsg() else: print "State table emptied."
You might remember the code to connect to the DB2 database from an earlier section of this tutorial, where you tested that the Python connection to DB2 was working. This time, you are performing three SQL statements—deleting all data from the country, region, and state tables, respectively. In each case, Python will output a message either confirming that the statement executed successfully or that an error occurred. If an error does occur, the DB2 error message is relayed to the user, making it easier to debug what went wrong.
Next, you need to delete a couple of print statements that output the outer XML
declaration for the single large document you created in the previous section. These lines are:
print "<data>" and
The former should be just beneath the
reader = csv.reader... line, and
the latter should be the last line of the file.
Finally, you need to change the convert.py file so that it doesn't print the XML code
for each row, but rather that it saves this as an XML document in the appropriate
DB2 table. You have already created code to determine if a particular line is a
country, region, or state, and to generate the XML for the row; so all you need to
do now is create the relevant
INSERT statement and execute it.
Find the line that currently reads
print xml. You need to replace this
line with the code from Listing 6. Keep in mind that Python is
very sensitive to code indentation, so be sure to line up your code correctly in
your text editor.
Listing 6. Excerpt from convert.py—saving rows to the DB2 database
if type == "country": sql = "INSERT INTO country(data) VALUES('%s')" % xml elif type == "region": sql = "INSERT INTO region(data) VALUES('%s')" % xml elif type == "state": sql = "INSERT INTO state(data) VALUES('%s')" % xml try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error adding row: ", ibm_db.stmt_errormsg() else: print "Row added to %s table" % type
The final code for convert.py will look like Listing 7. Again, indentation is hugely significant in Python, so ensure that it is correct or you might experience unexpected results.
Listing 7. convert.py
import csv, ibm_db connString = "DATABASE=census;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=jjlennon; PWD=DopGX240;" try: conn = ibm_db.connect(connString, "", "") except: print "Could not connect to DB2: ", ibm_db.conn_errormsg() else: print "Connected to DB2." sql = "DELETE FROM country" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from country table: ", ibm_db.stmt_errormsg() else: print "Country table emptied." sql = "DELETE FROM region" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from region table: ", ibm_db.stmt_errormsg() else: print "Region table emptied." sql = "DELETE FROM state" try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error deleting from state table: ", ibm_db.stmt_errormsg() else: print "State table emptied." reader = csv.reader(open('data.csv'), delimiter=',', quoting=csv.QUOTE_NONE) for record in reader: if reader.line_num == 1: header = record else: innerXml = "" dontShow = False type = "" for i, field in enumerate(record): innerXml += "<%s>" % header[i].lower() + field + "</%s>" % header[i].lower() if i == 1 and field == "0": type = "country" elif type == "" and i == 3 and field == "0": type = "region" elif type == "" and i == 3 and field != "0": type = "state" if i == 1 and field == "X": dontShow = True if dontShow == False: xml = "<%s>" % type xml += innerXml xml += "</%s>" % type if type == "country": sql = "INSERT INTO country(data) VALUES('%s')" % xml elif type == "region": sql = "INSERT INTO region(data) VALUES('%s')" % xml elif type == "state": sql = "INSERT INTO state(data) VALUES('%s')" % xml try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error adding row: ", ibm_db.stmt_errormsg() else: print "Row added to %s table" % type
Make sure you have saved this file and open a Windows command prompt. Change to the
project directory and run the convert.py script again, this time using the following
command (don't pipe the output to a file):
You should see a number of "Row added to state table" messages appearing one after the other, as in Figure 8.
Figure 8. Output from revised convert.py
Before you read this data from DB2 using Python, open the DB2 Command Editor and
check how this data looks in the database. Make sure you are connected to the census
database (issue the command
connect to census if required) and enter
the following SQL statement:
select * from state. This query should
produce 51 results, as in Figure 9.
Figure 9. Query Results view
Click on the more (...) button next to any one of the rows in the Query Results tab. This will open the XML Document Viewer, showing that particular row's associated XML document. This should look similar to the screen capture in Figure 10.
Figure 10. XML document viewer
Feel free to execute similar SQL statements to retrieve records from the country and region tables; you should get a single row result for the country table and four rows for the region table.
Next, you will learn how to read this data from DB2 into Python and present it to the user.
Reading XML from DB with Python
In this section, you will learn how to build a command-line Python application that will request user input to select one of three menu options. These options will allow the user to view a list of states, regions, or countries ordered by the population driven from the 2000 census.
To start, you'll connect to the DB2 database, print the list of menu options, and request the user's input. Create a new file called read.py and add the code from Listing 8 to it.
Listing 8. Excerpt from read.py—getting started
import ibm_db, locale, sys locale.setlocale(locale.LC_ALL, '') connString = "DATABASE=census;HOSTNAME=localhost;PORT=50000; PROTOCOL=TCPIP;UID=username;PWD=password;" try: conn = ibm_db.connect(connString, "", "") except: print "Could not connect to DB2: ", ibm_db.conn_errormsg() else: print "Connected to DB2." print "To view population information, please select one of the following options:" print "1.) List of states by population" print "2.) List of regions by population" print "3.) List of countries by population" print "4.) Exit the application" input = False while input == False: try: option = int(raw_input("Please enter a number from the options above to view that information: ")) if option not in [1,2,3,4]: raise IOError('That is not a valid option!') except: print "That is an invalid option." else: input = True
In Listing 8, you are first importing the ibm_db and locale libraries. The locale library is required to format the population number so that it is more readable (using thousand separators). You begin the application by setting the locale to the default setting on your machine. Next, you connect to the DB2 database, before printing information about the different menu options that will be available to the user.
The final section of code in Listing 8 requests that the user enter a value, and verifies that this is an integer, and is one of the four available options—1, 2, 3 or 4. If the value provided is not one of these values, it will keep asking for a value until a valid one is entered. The user can exit the program at any time by selecting option 4.
Now that the application has determined what data the user wants to see, it must build an appropriate SQL statement to retrieve this data. The code in Listing 9 does just that.
Listing 9. Excerpt from read.py—building the SQL
selected = "" if option == 1: sql = "select x.* from state s, xmltable('$d/state' passing s.data as \"d\"\ columns \ name varchar(50) path 'name', \ pop int path 'census2000pop') as x \ order by x.pop desc" selected = "state" elif option == 2: sql = "select x.* from region r, xmltable('$d/region' passing r.data as \"d\"\ columns \ name varchar(50) path 'name', \ pop int path 'census2000pop') as x \ order by x.pop desc" selected = "region" elif option == 3: sql = "select x.* from country c, xmltable('$d/country' passing c.data as \"d\"\ columns \ name varchar(50) path 'name', \ pop int path 'census2000pop') as x \ order by x.pop desc" selected = "country" elif option == 4: sys.exit()
In Listing 9, the if block checks whether the user's input selection was the value 1, 2, 3 or 4. If it detects a value between 1 and 3, it will create an SQL statement for viewing population data for states, regions, or countries. If it detects that 4 was entered, it will exit the program.
The SQL statement for each option is virtually the same, except that it looks at a
different table in each instance. It basically uses the
function to map XML elements in the data column of the table to different
relational-style columns. It then orders the data by the population value, from
highest number to lowest number.
The final part of the application is executing the SQL statement and looping through the result set to produce a table of results. Listing 10 shows this code.
Listing 10. Excerpt from read.py—Formatting results
try: stmt = ibm_db.exec_immediate(conn, sql) except: print "Error retrieving data: ", ibm_db.stmt_errormsg() else: res = ibm_db.fetch_both(stmt) print ".----------------------------------------------," print "| |" print "|", ("%s LIST BY POPULATION" % selected.upper()).center(44), "|" print "| |" print "|----------------------------------------------|" print "|", ("%s" % selected.upper()).center(21), " | ", "POPULATION".center(18), "|" print "|----------------------------------------------|" while res != False: print "|", res.ljust(21), " | ", locale.format("%d", res, grouping=True) .rjust(18), "|" res = ibm_db.fetch_both(stmt) print "'----------------------------------------------'"
In this code, you execute the SQL statement that was generated by the code in Listing 9, and print out a table that nicely formats the results. You are using a series of Python functions in this section that perform string manipulation such as left-justifying, centering, and right-justifying text and formatting the population value with thousand separators to make it easy to read.
With the code for reading from the database complete, you are now ready to execute
the script. From the Windows Command Prompt, make sure that you are in the project
directory and use the following command to start the program:
When the program executes, it will connect to DB2 and present you with the following list of menu options, which you can enter in the application:
- List of states by population
- List of regions by population
- List of countries by population
- Exit the application
Figure 11. Application menu
Figure 12. Invalid menu option error
Figure 13. Regional population data
As you can see, the application presents a table with a list of regions, with the region with the largest population displayed first. You should see a similar result for the other two menu options, except that option 1 will display 51 states and option 3 will display just the one country.
Be sure to try out the different menu options for yourself, and try to enhance the application by adding some more options and different views of the data.
In this tutorial, you learned how to create a DB2 database that features tables with native XML data columns. You then learned how to install Python and the ibm_db extensions for Python through the easy_install utility. Next, you verified that you can communicate with the DB2 database from the Python interpreter. You then developed a Python script that pulled down population data from the U.S. Census Bureau Web site, before you converted this CSV data into XML format and saved it in DB2 tables. Finally, you created a basic command-line application that provides tabular reports about national, regional, and state-wide population data.
With the information provided in this tutorial, you should have the knowledge required to further your Python and DB2 development skills.
- Create an alerts system using XMPP, SMS, pureXML, and PHP (Joe Lennon, developerWorks, November 2009): Follow along in this tutorial to import an XML file with Euro foreign exchange rates into an IBM DB2 database and use special XQuery and SQL/XML functions to split this XML into separate database rows.
- Build a Support Knowledge Base using DB2 pureXML and PHP (Joe Lennon, developerWorks, November 2009): Learn to create applications that use a hybrid of relational data and XML data thanks to the pureXML feature of IBM DB2 database servers.
- Leveraging pureXML in a Flex microblogging application (Joe Lennon, developerWorks, November 2009): In this three-part article series, discover how to create a microblogging app that takes advantage of pureXML, Web services, and Adobe Flex; and even allows you to publish your microblogging updates on Twitter.
- Getting Started with IBM DB2 Express-C: Visit the home page for DB2 Express-C.
- The Python Tutorial: Start learning Python.
- The Charming Python column (David Mertz, developerWorks, June 2000 - November 2009): Read the advanced articles in this series, some of the many developerWorks articles about Python.
- How do I run a Python program under Windows?: Find information on how to place the Python directory on your Windows Path.
- DB2 9 pureXML Guide: Read about the pureXML data store, hybrid database design, and administration in this IBM Redbook.
- DB2 9: pureXML Overview and Fast Start: Review this Redbook introduction to the hybrid XML data services in DB2 9 for Linux™, UNIX®, and Windows.
- DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server (Matthias Nicola and Pav Kumar-Chatterjee, IBM Press, August 2009): In this book by two leading experts from IBM, find practical solutions and proven code samples that database professionals need to build better XML solutions faster.
- Integrated Data Management: Managing data across its lifecycle (Holly Hayes, developerWorks, June 2009): In this overview article, explore both the vision and reality of Integrated Data Management and how you—whether a data architect, developer or tester, DBA, or data steward—can use IBM solutions today to respond quickly to emerging opportunities, improve quality of service, mitigate risk, and reduce costs.
- Developing with PL/SQL in IBM Data Studio 2.2 and Optim Development Studio 2.2 (Thomas Sharp, developerWorks, August 2009): Learn to use IBM Data Studio 2.2 and Optim Development Studio 2.2 to develop routines that use Oracle PL/SQL.
- Python: Download and get started with this powerful programming language. Version 2.6 is used in this tutorial.
- easy_install utility (setuptools): Install this utility and the ibm_db extension for Python so you can connect to and interact with an IBM DB2 database using Python code.
- IBM certification: Find out how you can become an IBM-Certified Developer.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM product evaluation versions: Get your hands on application development tools.