Before you start
This tutorial is geared towards developers who want to learn how to store data in XML format in a database, connect to DB2 from a Python application, and learn how to convert data from CSV files into XML documents. No prior knowledge of Python is assumed (you will learn how to install it in this tutorial), but it would be advantageous. This tutorial assumes that you use a Microsoft® Windows® operating system, but the code should work on other platforms without modification. When you complete this tutorial, you will have the skills to create powerful Python applications that can communicate and interact with an IBM DB2 database and harness the power that pureXML offers.
The IBM DB2 database management system has long been a leading player in the area of relational data management. In recent years, however, there has been a significant rise in the requirement for data structures that are more flexible and document-oriented in nature. One of the more prominent examples of such data structure is XML.
While many relational database systems have rushed to incorporate some form of XML support in their database, IBM DB2 is the only such offering that allows XML to be stored natively in the database, unchanged and true to its original form. This is referred to as pureXML—a technology that allows DB2 developers and DBAs to manipulate and report on XML data alongside relational data, without negatively affecting the purity of the XML itself.
In this tutorial, you will develop a Python script that connects to the United States Census Bureau Web site and downloads a CSV file containing data about the population at a national, regional, and state-wide level—including the results of the 2000 Census and fluctuations based on estimates in each year since then. You will then learn how to process this data, converting it into an XML document. Rather than import this large document and rely on DB2 functions to slice and dice it into individual rows, you will then use Python to insert this data into DB2, with an XML document stored per each relevant row in the CSV file. Finally, you will create a command-line application that produces some useful reports on this data, showing a list of states, regions, or countries in the order of highest to lowest population.
To follow the steps in this tutorial, you will need to have the following software installed:
- IBM DB2 Express-C 9.5 or later
- Python Version 2.6 or any pre-3.0 version
See Resources for the links to download these prerequisites. This tutorial assumes that you are using a Microsoft Windows operating system, preferably XP or later. In order to install Python and the IBM DB2 extension for Python, you will need administrative privileges on your computer.