Before you start
About this series
This tutorial series teaches basic to advanced SQL and basic XQuery topics and shows how to express commonly asked business questions as database queries by using SQL queries or XQueries. Developers and database administrators can use this tutorial to enhance their database query skills. Academic Initiative members can use this tutorial series as a part of their database curriculum.
All the examples in this document are based on Aroma, a sample database that contains sales data for coffee and tea products sold in stores across the United States. Each example consists of three parts:
- A business question, expressed in everyday language
- One or more example queries, expressed in SQL or XQuery
- A table of results returned from the database
This guide is designed to allow participants to learn the SQL language and XQuery. As with any learning, it is important to supplement it with hands-on exercises. This is facilitated by the table definitions and data.
For students using this as part of an academic class, obtain from your instructor the instructions to connect to the Aroma database and learn about any differences between the guide and your local set up.
This tutorial was written for DB2 Express-C 9 for UNIX®, Linux® and Windows® (formerly known as Viper).
About this tutorial
This tutorial introduces participants to the Aroma database, especially the retail group of tables, table definitions, sample data, and the relationship between tables.
These tables were created to show an environment where:
- There are few tables
- Column names use the analyst's vocabulary
- Columns are reflective of the natural dimension of the business
This is a typical, small data warehouse using a star schema. The other type of database is Online Transaction Processing (OLTP). OLTP databases contain transacations at the most granular level.
This tutorial also introduces the basics of SQL and suggests some typical business questions that IBM DB2 9 can be used to answer.
To use this tutorial to the fullest, you should have IBM DB2 9 installed. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications. You will also need to install the Aroma database, which can be downloaded from here.
Installing DB2 Express-C 9
To install DB2 9, follow these steps (default options should be used unless otherwise specified):
- Navigate to the folder containing the downloaded file - db2exc_viper_WIN_x86.zip (we will assume the file is downloaded to the C:\DownloadDirector folder).
- Unzip the file.
- Find setup.exe in the unzipped folder and double-click on it. The DB2 Setup Launchpad will appear.
- Within the DB2 Setup Launchpad, click on Install a Product from the panel on the left.
- Click on Install New under DB2 Express.
- In the installer, click Next to view the License Agreement. You will need to Accept the agreement to continue and click Next.
- Choose a Typical installation and click Next.
- Confirm that you plan to "Install DB2 Universal Database Express Edition on this computer and save my settings in a response file" and click Next.
- Verify the installation directory and click Next.
- In the "Configure DB2 instances" screen, click Next.
- In the "User Information" screen, set the Domain to None -- use local user account using the pull-down menu, use db2admin as the User name, and set a password.
(Optionally, you can select the LocalSystem account, but there are limitations introduced if this option is used. Click the Help button to learn more.)
- Make sure the checkbox Use the same user name and password for the remaining DB2 services is checked and click Next.
- Click Next to skip the "Prepare the DB2 tools catalog" screen.
- To simplify installation, uncheck Set up your DB2 server to send notifications and click Next.
- Verify that the Enable operating system security checkbox is checked. Leave the DB2 administrators group as the default value DB2ADMNS and leave DB2 users group as the default value DB2USERS. Click Next.
- Click Finish to start the DB2 product installation when you reach the "Start copying files and create response file" screen.
- When the installation is complete, a window with the message "Setup is complete" will appear. Click Finish to complete the installation procedure.
- A window entitled "DB2 First Steps" will now appear. Click Create profile. Your Web browser will open a page called DB2 First Steps. This tutorial contains many useful links to information on DB2, but it will not be used during this tutorial. Close your Web browser and continue with this tutorial.
Installing the Aroma database
- Ensure DB2 Express-C 9 is installed.
- Start the Command Window from the Start menu ==> all Programs ==> IBM DB2 ==> Command Line Tools ==> Command Window.
- Execute create_aroma.bat in the Command Window.
Note: Don't forget to move to the directory where the batch file is located using the
- Two files, create.txt and report.txt, will be created. Edit report.txt and compare to the expected results below:
XML and basic XQueries will be introduced in Part 7 of this series. To support XML, the aromadb database was created with the statement:
db2 create db aromadb using codeset utf-8 territory us
Note the specification of UTF-8 as the codeset for the database. The use of XML features is restricted to a database that is defined with codeset UTF-8 and has only one database partition. When creating your own databases in the future, don't forget to include this parameter or your database won't support XML features!
Aroma database group of tables
Most of the examples in this guide are based on data from the Aroma database, which tracks aggregated daily retail sales in stores owned by the Aroma Coffee and Tea Company. The Aroma database contains tables that are used to understand the sales for the Aroma Coffee and Tea Company. There are six tables, which are used to create different groupings and to understand the sales. The tables are: Sales, which contain the sales transactions or facts; Period, Product, Store, Promotion Class, and Market. These are referred to as the retail group of tables.
Another set of tables contain data to understand the purchases of Aroma Coffee and Tea Company. This set of tables is referred to as the purchasing group of tables and can be used to understand the purchases from suppliers. This group of tables is not used in any examples in this tutorial. It only exists in the Appendix to provide a more complete picture of the Aroma Coffee and Tea Company.
Figure 1. Retail group of tables
The crow's feet in this diagram indicate a one-to-many relationship between two tables. For example, each distinct value in the Perkey column of the Period table can occur only once in that table but many times in the Sales table. Column names in bold are primary-key columns. Column names in italics are foreign-key columns. Column names in bold italics are primary and foreign-key columns.
Sample data from each table is presented in the following sections so you can see how these primary-key to foreign-key relationships work.
A primary key is a unique key that is a part of the definition of the table. A table cannot have more than one primary key, and the columns of a primary key cannot contain null values. (Null values will be discussed later in detail. If a value is null, that means there is a place to store the data, but nothing is there - this is different from a blank.) Primary keys are optional and can be defined in CREATE TABLE or ALTER TABLE statements. Primary keys are important because there can only be one occurrence of each value; the same row will always be selected. There may be a couple of cities with the same name; however, selecting the record based on the primary key will ensure the same row is always selected.
A foreign key is specified in the definition of a referential constraint. A table can have zero or more foreign keys. The value of the composite foreign key is null if any component of the value is null. Foreign keys are optional and can be defined in CREATE TABLE statements or ALTER TABLE statements.
Basic Aroma tables
The tables and columns are named with familiar business terms, making them easy to understand and use. A well-designed set of tables provides the following benefits to application developers and end users:
- Business questions are easy to express as SQL queries
- Queries run fast and return consistent answers.
You will notice that the Aroma database does not record each transaction that takes place. The data has been aggregated to reflect the total sales at each store of a given product on a given day under a particular promotion.
Aggregated data is typical of data warehouses. The idea is to summarize data so there are not as many rows. Normally this is done by a database administrator (DBA) or computer program, and is commonly referred to as ETL (Extract, Transform, Load).
Period, Product, and Class dimensions
The following table displays the first few rows of the Period table. The primary-key column is the Perkey column:
Product and Class Tables
The following table displays the first few rows of the Product table. The primary key is a combination of the Classkey and Prodkey values:
|1||1||Xalapa Lapa||No pkg|
|1||11||Espresso XO||No pkg|
|1||12||La Antigua||No pkg|
|1||20||Lotta Latte||No pkg|
Note: recall that a primary key is a unique key that is a part of the definition of the table. Normally, a primary key is a single column in a table. In this case, the primary key is a combination of two columns, which means that there can only be one combination like this in a table.
The following table displays the first few rows of the Class table.
|1||Bulk_beans||Bulk coffee products|
|2||Bulk_tea||Bulk tea products|
|4||Pkg_coffee||Individually packaged coffee products|
|5||Pkg_tea||Individually packaged tea products|
|6||Pkg_spice||Individually packaged spice products|
Store, Market, and Promotion tables
The Store, Market, and Promotion tables contain descriptions that data analysts use as they query the database. For example, the Store table contains store names and addresses; the Product table contains product and packaging information; and the Period table contains month, quarter, and year values. Every table contains a primary key that consists of one or more columns; each row in a table is uniquely identified by its primary-key value or values.
Store and Market tables
The following table displays the first few rows of the Store table (some columns were truncated to fit on the page). The primary-key column is Storekey; Mktkey is a foreign-key reference to the Market table.
|1||14||Small||Roasters, Los Gatos||1234 University Ave||Los Gatos||CA||95032|
|2||14||Large||San Jose Roasting||5678 Bascom Ave||San Jose||CA||95156|
|3||14||Medium||Cupertino Coffee||987 DeAnza Blvd||Cupertino||CA||97865|
|4||3||Medium||Moulin Rouge||898 Main Street||New Orleans||LA||70125|
|5||10||Small||Moon Pennies||98675 University||Detroit||MI||48209|
|6||9||Small||The Coffee Club||9865 Lakeshore Bl||Chicago||IL||06060|
The following table displays the first few rows of the Market table.
|3||New Orleans||LA||New Orleans||South|
|5||New York||NY||New York||North|
The following table displays the first few rows of the Promotion table. The primary-key column is Promokey.
|1||100||Aroma catalog coupon||1.00||2004-01-01||2004-01-31|
|2||100||Aroma catalog coupon||1.00||2004-02-01||2004-02-29|
|3||100||Aroma catalog coupon||1.00||2004-03-01||2004-03-31|
|4||100||Aroma catalog coupon||1.00||2004-04-01||2004-04-30|
|5||100||Aroma catalog coupon||1.00||2004-05-01||2004-05-31|
The Value column identifies the cash value of a given promotion. The user can evaluate the cost of the promotions by summing these values.
The following table displays the first 20 rows of the Sales table.
The primary-key definition is a combination of values from five columns:
perkey, classkey, prodkey, storekey, promokey
About the Sales table
The Sales table contains data that is easily accessible throughout the business attributes defined in the table it references, and it stores large amounts of statistical information about those attributes. The Sales table is the largest table in this database.
The last column, Comments, stores information about customer feedback using the XML data type. XML is introduced in Part 7 of this series.
Multipart primary key
The Sales table contains a multipart primary key. Each of the key's five columns is a foreign-key reference to the primary key of another table:
perkey, classkey, prodkey, storekey, promokey
This primary key links the Sales data to the Period, Product, Store, and Promotion tables.
The Gold table contains information for "Gold Card" customers that allow them to enjoy various discounts, free gifts, and e-mail reminders of promotional activities. The Gold table will be created by the participant in Part 6 of this series, and will have the following columns: unique card ID, store key, first name, last name, e-mail, and status.
What is SQL?
Structured Query Language
Structured Query Language (SQL) is a standardized language for defining and manipulating data in a relational database. All relational database management systems (RDBMS) understand SQL.
Here is some more information about SQL:
- SQL was devised specifically as part of the relational theory. You need a language to talk to your database, to retrieve data from it, or to make changes to the data.
- SQL is a non-navigational language. In other words, with SQL, you don't have to tell DB2 to find a record, read its pointer, and follow that pointer to a related row in the table. Simply tell DB2 what to do and DB2 figures out how to handle the command. It's that simple!
- You can use SQL to modify aspects of a table without having to take the table offline, add new columns to an existing table, or add business rules to the definition of the table.
Commonly asked questions
Here is a sampling of business questions that you can use SQL to answer. With the answers, critical business decisions can be made in a timely manner.
- What were the weekly sales of Lotta Latte brand coffee in San Jose during last year?
- What were the average monthly sales of all coffee products in the West during each month of last year?
- How do the sales of Lotta Latte in San Jose compare with its sales in Los Angeles and New York?
- How has the monthly market share of Lotta Latte changed during the last two years in all markets?
- Which suppliers charge the most for bulk tea products?
- What was the most successful promotion last December in California?
Very challenging business questions, which DB2 9 can be used to provide answers.
- What were the running totals for Lotta Latte sales during each month of last year?
- What were the ratios of monthly sales to total sales (expressed as percentages) for Lotta Latte during the same period?
- Which ten cities had the worst coffee sales in 2004 with regard to dollar sales and quantities sold?
- Which Aroma stores fall into the top 25 percent in terms of sales revenue for the first quarter of 2005? Which stores fall into the middle 50 percent, and which fall into the bottom 25 percent?
Data types define acceptable values for constants, columns, host variables, functions, expressions, and special registers. In the most introductory form, data must be defined as either a numeric data type or a text data type. This is important for two main reasons. First, with numeric data types, mathematical functions can be applied, such as adding fields together or subtracting fields. Second, to ensure the right type and amount of storage is allocated, the size can be preset. How many times have you filled in an application and there was not enough space to write everything required? However, increasing all fields to the maximum leads to too much empty space in the database. This section describes the data types referred to in the examples. For a full list and complete description of other data types, refer to the SQL Reference Guide.
All numbers have a sign and a precision. The precision is the number of bits or digits, excluding the sign.
|SMALLINT||A SMALLINT (small integer) is a two-byte integer with a precision of 5 digits.|
|INTEGER||An INTEGER (large integer) is a four-byte integer with a precision of 10 digits.|
|BIGINT||A BIGINT (big integer) is an eight-byte integer with a precision of 19 digits.|
|REAL||A REAL (single-precision floating-point number) is a 32-bit approximation of a real number.|
|DOUBLE||A DOUBLE (double-precision floating-point number) is a 64-bit approximation of a real number. DOUBLE is also referred to as FLOAT.|
|DECIMAL(p,s)||A DECIMAL is a decimal number. The position of the decimal point is determined by the precision (p) and the scale (s) of the number. Precision is the total number of digits and has to be less than 32. Scale is the number of digits in the fractional part and is always smaller than or equal to the value of precision. The decimal value defaults to precision of 5 and scale of 0 if precision and scale are not specified.|
A character string is a sequence of bytes. The length of the string is the number of bytes in the sequence. If the length is zero, the value is called the empty string.
|Fixed-Length Character String||CHAR(x) is a fixed length string. The length attribute x must be between 1 and 254, inclusive.|
|Varying-Length Character String||Varying-length character strings are of three types: VARCHAR, LONG VARCHAR, and CLOB.|
VARCHAR(x) types are varying-length strings, so a string of length 9 can be inserted into VARCHAR(15) but will still have a string length of 9.
See Large Objects (LOBs) for details on CLOB.
A graphic string is a sequence of double-byte character data.
|Fixed-Length Graphic String||GRAPHIC(x) is a fixed length string. The length attribute x must be between 1 and 127, inclusive.|
|Varying-Length Graphic String||Varying-length graphic strings are of three types: VARGRAPHIC, LONG VARGRAPHIC, and DBCLOB. See Large Objects (LOBs) for details on DBCLOB.|
A binary string is a sequence of bytes. It is used to hold nontraditional data, such as pictures. Binary Large OBject (BLOB) is a binary string. See Large Objects (LOBs) for more information.
Datetime values are representations of dates, times, and timestamps (a character string of 14 digits that represents a valid date and time in the form yyyyxxddhhmmss). Datetime values can be used in certain arithmetic and string operations and are compatible with certain strings; however, they are neither strings nor numbers.
|Date||A date is a three-part value (year, month, and day).|
|Time||A time is a three-part value (hour, minute, and second) designating a time of day using a 24-hour clock.|
|Timestamp||A timestamp is a seven-part value (year, month, day, hour, minute, second, and microsecond) designating a date and time.|
The null value is a special value that is distinct from all non-null values. It means the absence of any other value for that column in the row. The null value exists for all data types.
DB2 9 is the industry's first hybrid data server for managing data from both relational and pureXML™ structures. DB2 9 has introduced an optimized data storage engine for XML data alongside the existing relational engine.
XML is self-describing in that the meaning of the data is included in the record. The Comments column uses the XML data type to store customer comments. See Part 7 of this series for more details.
Ways to run SQL
To talk to DB2, you use a tool, specific to a platform, to enter your SQL. These tools are included when you install the product. For the purpose of this series, use the Control Center or the Command Line Processor (CLP).
The Control Center
SQL statements, DB2 commands, and operating-system commands can be run from the Control Center. Open the Control Center from the Start menu ==> all Programs ==> IBM DB2 ==> General Administration Tools ==> Control Center. From the menu at the top of the interface, select Tools ==> Command Editor. A command editor window will pop up and allow you to enter SQL statements.
You can run these interactively or you can save your statements in a script. To execute these statements, you would click the Play icon in the upper-left corner of the graphic (circled in Figure 2, below).
Figure 2. Control Center
Command Line Processor
Statements can alternatively be run from the Command Line Processor (CLP). Open the CLP from the Start menu ==> all Programs ==> IBM DB2 ==> Command Line Tools ==> Command Line Processor.
Figure 3. Command Line Processor
Connecting to a database
You need to connect to a database before you can use SQL statements to query or manipulate data. The CONNECT statement associates a database connection with a user name.
To connect to the aromadb database, type the following command in the DB2 command line processor:
CONNECT TO aromadb USER userid USING password
If you are using this guide as part of a classroom, ask your instructor for the userid and password to connect to the database.
If no userid and password are required, or if you are using this in a self-study mode and using your own computer, simply type the following command:
CONNECT TO aromadb
The following message tells you that you have made a successful connection:
Database Connection Information Database server = DB2/NT 9.0.0 SQL authorization ID = USERID Local database alias = AROMADB
Once you are connected, you can start using the database.
DB2 Information Center
In addition to the standard set of documentation available for DB2 9, there is also a searchable online collection of DB2 information that simplifies the tasks associated with DB2 9. The DB2 Information Center allows a user to receive help on any number of issues.
The DB2 Information Center is accessible through a standard Web browser or through built-in facilities in a number of the graphical user interfaces that come with DB2 9. Currently Microsoft Explorer (version 5 and later) and Netscape Navigator (version 6.1 and later) are supported.
Contents of the DB2 Information Center are updated with each product or fixpak release. Be certain you know which version of DB2 you are running to ensure that you obtain the correct information. The DB2 Information Center becomes more helpful the more that you take advantage of it. As you become familiar with the options available, you can narrow your search more effectively and get the answers to your questions more efficiently.
- You can easily access the DB2 Information Center from the Start menu ==> all Programs ==> IBM DB2 ==> Information ==> Information Center.
- Alternatively, you could use this url: http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp
In Part 3 of this series you will encounter an error when you attempt to execute a query that incorporated mismatched datatypes. You will receive the following error message:
[IBM][CLI Driver][DB2/LINUX] SQL0401N The data types of the operands for the operation ">" are not compatible. SQLSTATE=42818
To better understand the meaning of this message, you may want to research the details of the error and learn the corrective action you should take.
- Navigate to the DB2 Information Center.
- Enter the error code from the error message (SQL0401N) in the Search box at the top left corner.
- Click GO.
- The result set displays two results. The first result is "100% SQL0401", indicating that it is 100% relevant, while the second result is only 8% relevant. Select the first one.
- Read the results of the search.
Metadata can be defined as data about the structure of the database itself. This data includes information such as lists of the tables, indexes, columns, and more included in the database.
You will often hear metadata called by other names, including catalog data or system tables. Metadata is automatically collected by DB2 9 whenever a user takes an action to create, delete, or change some aspect of the database.
The standards of storing data established for conventional SQL do not apply to the storage of metadata. Each database provider sets up unique methods to store and process the metadata. Knowing how to retrieve this information can be of tremendous assistance in working with a database.
Metadata, however, is considered by many database administrators to be somewhat confidential in nature. As a result, many users will not have access to the collection of metadata. You will have to learn at your individual location what your access to metadata will be.
DB2 9 stores metadata in nearly 100 system tables. They are all contained in the SYSIBM schema. This schema is created automatically each time a database is created. System tables also follow a standard naming convention. Each table name begins with SYS, as in SYSTABLES, SYSINDEXES, and SYSUSERAUTH. (For this reason, user-defined tables cannot begin with the letters SYS.)
A user retrieves metadata from system tables in much the same way as data from regular tables. SQL statements behave the same in both cases. Knowledge of the structure of the tables is all that is needed. Given the appropriate authorizations, a user can obtain a list of all of the system tables holding metadata by querying the database.
SELECT name FROM sysibm.systables WHERE name LIKE 'SYS%';
You may or may not be able to execute the above query, depending on permissions you have been granted by the System Manager and/or Database Administrator.
If you execute the above query without the WHERE clause, you will obtain a list of all tables in the database. The result set will be ordered by Creator (schema) and alphabetically by table name within each Creator. The column name "Creator" assumes that the schema identified for each table has the same name as its creator.
SELECT name, creator FROM sysibm.systables;
By constraining on the name of the schema, you can limit the result set to those tables that are part of a given schema.
SELECT name, creator FROM sysibm.systables
Similarly, you can list information about the columns in a specific table by accessing the sysibm.syscolumns system table.
Examine the contents of this table by executing the command. There are many more columns in the result set than shown here.
This tutorial introduced the retail group of tables and suggested some typical business question that IBM DB2 9 can be used to answer. These tables were created to show an environment where:
- There are few tables
- Column names use the analyst's vocabulary
- Columns are reflective of the natural dimension of the business.
The bulk of this series consists of detailed examples that show how to write SQL queries to answer business questions. Most of these examples are based on the Aroma retail group of tables. The few exceptions are where very small tables are introduced to demonstrate specific concepts.
The most commonly used data types were described in this tutorial:
|Numbers||SMALLINT, INTEGER, BIGINT, REAL, DOUBLE, DECIMAL|
|Character String||Fixed-Length Character String (CHAR), Varying-Length Character String (VARCHAR)|
|Graphic String||Fixed-Length Graphic String (GRAPHIC), Varying-Length Graphic String (VARGRAPHIC, LONG VARGRAPHIC, and DBCLOB)|
|Binary String||Binary Large OBject (BLOB)|
|Datetime Values||Date, Time, Timestamp|
- Read the "DB2 XML evaluation guide" (developerWorks, June 2006), a step-by-step tutorial introducing the reader to the DB2 Viper data server on Windows platforms using the XML storage and searching (SQL/XML, XQuery) capabilities available to support next-generation applications.
- Check out this article and "Get off to a fast start with DB2 Viper" (developerWorks, March 2006).
- View this article series' "Appendix A" (developerWorks, August 2006).
- Learn how to "Query DB2 XML data with XQuery" (developerWorks, April 2006).
- Learn how to "Query DB2 XML data with SQL" (developerWorks, March 2006).
- Read the IBM Systems Journal and celebrate 10 years of XML.
- Refer to the SQL Reference, Vol 1 for additionanl information.
- Refer to the SQL Reference, Vol 2 for additionanl information.
- Refer to the DB2 information Center for troubleshooting.
Get products and technologies
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.