Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 4

Select data with a query

The Apache Derby software provides a powerful, open source database that can be used as a persistent store for a wide range of database applications. One of the main reasons for this popularity is Apache Derby's query support, which lets you selectively extract columns from specific rows across one or more tables that satisfy some Boolean condition. Learn about Apache Derby's query capabilities and how to use the SELECT statement to perform complex queries.

Robert Brunner, NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign

Robert J. BrunnerRobert J. Brunner is a research scientist at the National Center for Supercomputing Applications and an assistant professor of astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics.



06 June 2006

Database queries

The previous article in this series, Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 3: Running scripts and inserting data, ended by running a script that inserted ten rows and then displayed these rows for validation. That article didn't discuss how the rows were selected for display because it focused on inserting data into an Apache Derby database. The topic of this article is the selection and extraction of data from an Apache Derby database. But before you can perform a database query, you must create a database that contains several tables populated with relevant data.

Rather than assuming you have such a database or requesting that you complete the steps listed in the previous articles in this series, this article provides a SQL script file called derby5.build.sql bundled in a .zip file called derby5.zip (see the Download section later in this article). This SQL script file first creates a database and then creates two tables within their own schema, inserts ten rows into each of these tables, and displays the contents of both tables for validation. To run the commands in this script file you can use any of the three methods discussed in the previous article in this series or follow along with the commands shown in Listing 1.

Listing 1. Initializing your Derby workspace
rb$ mkdir derbyWork
rb$ cd derbyWork/
rb$ unzip ../derby5.zip 
Archive:  ../derby5.zip
  inflating: derby.build.sql         
rb$ ls
derby.build.sql
rb$ java org.apache.derby.tools.ij < ../derby.build.sql 
ij version 10.1
ij>ij> ERROR 42Y07: Schema 'BIGDOG' does not exist
ij> ERROR 42Y07: Schema 'BIGDOG' does not exist
ij> 0 rows inserted/updated/deleted
ij> 0 rows inserted/updated/deleted
ij> 10 rows inserted/updated/deleted
ij> 10 rows inserted/updated/deleted
ij> ITEMNUMBER |PRICE   |STOCKDATE |DESCRIPTION                             
------------------------------------------------------------------------
1          |19.95   |2006-03-31|Hooded sweatshirt                       
2          |99.99   |2006-03-29|Beach umbrella                          
3          |0.99    |2006-02-28|                                        
4          |29.95   |2006-02-10|Male bathing suit, blue                 
5          |49.95   |2006-02-20|Female bathing suit, one piece, aqua    
6          |9.95    |2006-01-15|Child sand toy set                      
7          |24.95   |2005-12-20|White beach towel                       
8          |32.95   |2005-12-22|Blue-striped beach towel                 
9          |12.95   |2006-03-12|Flip-flop                               
10         |34.95   |2006-01-24|Open-toed sandal                        
    
10 rows selected
ij> ITEMNUMBER |VENDORNUMB&|VENDORNAME                    
------------------------------------------------------
1          |1          |Luna Vista Limited            
2          |1          |Luna Vista Limited            
3          |1          |Luna Vista Limited            
4          |2          |Mikal Arroyo Incorporated     
5          |2          |Mikal Arroyo Incorporated     
6          |1          |Luna Vista Limited            
7          |1          |Luna Vista Limited            
8          |1          |Luna Vista Limited            
9          |3          |Quiet Beach Industries        
10         |3          |Quiet Beach Industries        
    
10 rows selected
ij> rb$

The commands shown in Listing 1 create and change into a work directory, called derbyWork in this example, expand the .zip file that contains the SQL build commands needed for the rest of this article, and execute the SQL commands within the script file using the ij Apache Derby interactive SQL tool. Although you don't have to perform all of these commands, you do need to process the derby.build.sql script file because it creates two tables and fills them with data.

In this example, you may get one of several errors. For example, you may get a database exists error or, as shown in Listing 1, a schema does not exist error. Both of these errors can be safely ignored. If you get a different error, or if you don't get the list of rows with the 10 rows selected message, something went wrong that must be addressed. For more information on possible problems, see the first article in this series, Developing with Apache Derby -- Hitting the Trifecta: Introduction to Apache Derby, or the Apache Derby Web site (see the Resources section at the end of this article for a link).


Select data

In the SQL programming language, the task of performing a query falls to the SELECT statement. To provide all the query functionality required by database applications, the SELECT statement's capabilities are extensive. The rest of this article covers the basics of the SELECT statement, which let you build powerful queries for your database-enabled applications. First, the following section introduces the formal syntax of SELECT.

The SELECT statement syntax

Formally, the syntax for the SELECT statement is simple, as shown in Listing 2. The basic format is SELECT ... FROM ... WHERE; you select the columns of interest from rows in a table or tables where certain conditions are satisfied. Of course, things can become considerably more complex. This article covers the basic features of SELECT and defers the more advanced issues to subsequent articles.

Listing 2. The formal syntax for the SELECT statement
SELECT [ DISTINCT | ALL ] SelectItem [ , SelectItem ]*
FROM clause
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause ]

From the syntax in Listing 2, you can see that a basic SELECT statement requires only a SELECT and a FROM; you must specify what data to select and indicate the location of the data of interest. Everything else is optional (as indicated by the square brackets). The DISTINCT and ALL keywords are optional qualifiers to indicate that either rows with unique values or all rows should be selected, respectively. By default, ALL is implicitly assumed, and you can use only one DISTINCT qualifier per SELECT statement.

A SELECT statement can have multiple columns listed following the SELECT keyword. Apache Derby currently limits you to 1,012 elements following the SELECT keyword -- meaning this is one limit that you'll probably never need to worry about! Multiple elements (or, more generally, column names) are separated by commas. For example, SELECT a, b, c selects the three columns a, b, and c. To select all columns from a table, you can use the asterisk character (*) as a shorthand for all columns. An important point to remember is that the result of any SELECT statement is an Apache Derby table, and you can use it in many of the same ways you use a more permanent table.

The FROM component of a SELECT statement indicates from which table (or multiple tables) the data will be extracted. This section focuses on selecting data from a single table; the last section in this article covers table joins and selecting data from multiple tables. In this case, the fully qualified name of the table to query must follow the FROM keyword.

The rest of the SELECT statement is optional. Before you build your first query, however, you should know the order in which Apache Derby evaluates the SELECT statement components. When Apache Derby processes a query, the order of evaluation is:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause

When you break down the process Apache Derby follows when processing a query, this order is intuitive. First you must locate the data to be analyzed, after which you filter out the rows of interest. The next steps are to group related rows and, finally, to select the actual columns of interest.

Select rows from a table

To demonstrate a SELECT statement, you can extract all the columns from the products table located in the bigdog schema, as shown in Listing 3.

Listing 3. Using the SELECT statement to extract rows from an Apache Derby table
rb$ java org.apache.derby.tools.ij    
ij version 10.1
ij> connect 'jdbc:derby:test' ;
ij> SELECT * FROM bigdog.products ;
ITEMNUMBER |PRICE   |STOCKDATE |DESCRIPTION                             
------------------------------------------------------------------------
1          |19.95   |2006-03-31|Hooded sweatshirt                       
2          |99.99   |2006-03-29|Beach umbrella                          
3          |0.99    |2006-02-28|                                        
4          |29.95   |2006-02-10|Male bathing suit, blue                 
5          |49.95   |2006-02-20|Female bathing suit, one piece, aqua    
6          |9.95    |2006-01-15|Child sand toy set                      
7          |24.95   |2005-12-20|White beach towel                       
8          |32.95   |2005-12-22|Blue-striped beach towel                 
9          |12.95   |2006-03-12|Flip-flop                               
10         |34.95   |2006-01-24|Open-toed sandal                        

10 rows selected
    
ij> SELECT * FROM products ;
ERROR 42X05: Table 'PRODUCTS' does not exist.
    
ij> SELECT price, itemNumber, description FROM bigdog.products ;
PRICE   |ITEMNUMBER |DESCRIPTION                                        
------------------------------------------------------------------------
19.95   |1          |Hooded sweatshirt                                  
99.99   |2          |Beach umbrella                                     
0.99    |3          |                                                   
29.95   |4          |Male bathing suit, blue                            
49.95   |5          |Female bathing suit, one piece, aqua               
9.95    |6          |Child sand toy set                                 
24.95   |7          |White beach towel                                  
32.95   |8          |Blue-striped beach towel                            
12.95   |9          |Flip-flop                                          
34.95   |10         |Open-toed sandal                                   

10 rows selected
ij>

When shortcuts go bad

Listing 3 uses the asterisk character to select all columns from the bigdog.products table without listing them explicitly. This can be a useful shortcut, especially when you're developing database applications, but it isn't a recommended practice. By using the shortcut, you don't explicitly specify the database column names or their order. In a database application, if you always assume that the column names and their order in a table are fixed, you may end up with subtle bugs if someone else modifies the database tables on which your application depends. You should always explicitly name the database columns in your SELECT statements and list the order you require.

Before you can query a database in Apache Derby, you must establish a database connection. This requires that you start the ij tool and issue the appropriate connect command. In this example, the first query uses the * shorthand to select all columns from the bigdog.products table. This is the exact statement used in the SQL script file you executed at the start of this article. In that case, the SELECT statement verified that the tables were created and loaded properly. The second SQL statement tries to perform the exact same query but without specifying the fully qualified name for the products table. Because Apache Derby can't locate the table, an error is issued.

The final SQL statement explicitly lists three columns -- price, itemNumber, and description -- after the SELECT keyword. This demonstrates that you can pull out only those columns of interest, and you can also extract them from the table in a different order than they exist within the database. Explicitly listing the columns is a best practice (see the sidebar for more details).


The WHERE clause

Up to this point, you have only selected columns for all rows in a single table. This can be expensive in terms of query performance, especially if you only want a subset of the rows from a large table. A more efficient technique is to filter database rows by placing conditions in the WHERE clause, which is evaluated immediately after the tables are specified within the FROM clause. The rest of this section discusses some of the basic features that are enabled by using the WHERE clause, including the ability to select rows that satisfy Boolean conditions as well as join multiple tables to perform more complex queries.

Filter rows

The simplest and most common use of the WHERE clause is to filter the rows from a table before selecting any columns, as demonstrated in Listing 4.

Listing 4. Filtering rows in a query using the WHERE clause
ij> SELECT p.itemNumber, p.price FROM bigdog.products AS p      
        WHERE p.price > 30.00 ;
ITEMNUMBER |PRICE   
--------------------
2          |99.99   
5          |49.95   
8          |32.95   
10         |34.95   

4 rows selected
     
ij> SELECT * FROM bigdog.products 
        WHERE price > 30.00 AND stockDate < '2006-01-01' ;
ITEMNUMBER |PRICE   |STOCKDATE |DESCRIPTION                             
------------------------------------------------------------------------
8          |32.95   |2005-12-22|Blue-striped beach towel                 

1 row selected
    
ij>

The first query shown in this example selects the itemNumber and price columns from the bigdog.products table for all rows where the price column has a value greater than $30.00. The second query extends this same query to select only those columns whose price column has a value more than $30.00 and whose stockDate column has a value less than January 1, 2006. These two query restrictions are combined in this query by using the Boolean AND operator.

You can perform a number of different Boolean operations within a WHERE clause. Table 1 lists and provides examples of the basic SQL Boolean operations that you can use in a query.

Table 1. Basic SQL Boolean operators
OperatorExampleDescription
=p.price = 29.95Test if any built-in type is equal to a specified value.
<p.price < 29.95Test if any built-in type is less than a specified value.
>p.price > 29.95Test if any built-in type is greater than a specified value.
<=p.price <= 29.95Test if any built-in type is less than or equal to a specified value.
>=p.price >= 29.95Test if any built-in type is greater than or equal to a specified value.
<>p.price <> 29.95Test if any built-in type is not equal to a specified value.
IS NULLp.description IS NULLTest if an expression or value is null.
IS NOT NULLp.description IS NOT NULLTest if an expression or value is not null.
AND(p.price > 29.92) AND (p.itemNumber > 5)Test if two expressions are both true or evaluate as nonzero.
OR(p.price > 29.92) OR (p.itemNumber > 5)Test if one or both of two expressions are true or evaluate as nonzero.
NOTNOT v.vendorNumber = 1Test if an expression is false or evaluates as zero.
BETWEENp.price BETWEEN 29.95 AND 39.95Test if a value lies inclusively between two other values (example is equivalent to 29.95 <= p.price <= 39.95).
LIKEv.vendorName LIKE 'Lun%'Test if a character expression matches a pattern, with the percent character (%) matching zero or more arbitrary characters and the underscore character (_) matching exactly one arbitrary character.

The first query also introduces the AS clause, which you can use to create a table synonym. In these examples, you define a synonym p for the fully qualified table name bigdog.products. By defining a synonym, you can refer to table quantities by using a shorter notation. This may not seem important when only one table is being referenced in a query, but the next section shows how to join multiple tables together within a query; in that case, providing table synonyms is very useful. You can also use an AS clause to name the selected columns in a query. Doing so lets you control how the results are displayed, which is also demonstrated in the next section.


Join tables

The second major function performed by a WHERE clause is to join multiple tables together into a single table that can be queried more easily. Joining multiple tables is a powerful technique, and it can be complex when you're dealing with several large tables. Tables can be joined either explicitly, by using the JOIN keyword, or implicitly, by using a WHERE clause.

You join two tables by using an inner join or an outer join. An inner join is essentially the intersection of two tables, where the tables are matched by comparing the values of a key column, such as itemNumber. The resulting table is composed of only rows that were matched between the two tables. An outer join is more like a union of two tables, where the tables are matched by comparing the values of a key column, but nonmatching rows are still included in the resulting table and filled with NULL values as appropriate. Writing SQL queries that use these more advanced table joins will be addressed in future articles.

In the current simple scheme, the process is simple; Listing 5 performs an implicit inner join of the bigdog.products table and the bigdog.vendors table.

Listing 5. Querying two tables using a table join
ij> SELECT p.price, p.description AS "Item", 
        v.vendorName AS "Vendor"
        FROM bigdog.products AS p, bigdog.vendors AS v
        WHERE p.itemNumber = v.itemNumber ;
PRICE   |Item                                    |Vendor                        
--------------------------------------------------------------------------------
19.95   |Hooded sweatshirt                       |Luna Vista Limited            
99.99   |Beach umbrella                          |Luna Vista Limited            
0.99    |                                        |Luna Vista Limited            
29.95   |Male bathing suit, blue                 |Mikal Arroyo Incorporated     
49.95   |Female bathing suit, one-piece, aqua    |Mikal Arroyo Incorporated     
9.95    |Child sand toy set                      |Luna Vista Limited            
24.95   |White beach towel                       |Luna Vista Limited            
32.95   |Blue-striped beach towel                 |Luna Vista Limited            
12.95   |Flip-flop                               |Quiet Beach Industries        
34.95   |Open-toed sandal                        |Quiet Beach Industries        

10 rows selected
ij>

This query may seem complex, due primarily to its length. But by breaking it down line by line you can easily follow what's happening. First, you select two columns from the bigdog.products table and one column from the bigdog.vendors table and use an AS clause to name these columns for the display with the ij tool. Because the query joins these two tables (by using an implicit inner join), you can select columns from both tables. In the FROM clause, you list both tables and provide aliases for them to simplify the full SQL statement. In the WHERE clause, you provide the logic for joining the two tables, by explicitly instructing the Derby database engine to only select rows from the two tables that have matching values in their respective itemNumber columns. In processing this query, the Derby database engine first pulls all rows out of the first (left) table in the query -- bigdog.products in this example -- and finds the row with a matching value in the itemNumber column in the second (right) table in the query -- in this case, bigdog.vendors.


Summary

This article introduced the SELECT statement and demonstrated how to use it properly with Apache Derby to select and extract data from a database. The full functionality of the SELECT statement is complex, but the basic concepts let you perform sophisticated queries that can extract multiple columns from one or more tables joined using an implicit inner join. The next article will discuss some of the more advanced functionality provided by the SELECT statement, which lets you compute quantities within a SQL statement, change data types, and order the resulting data.


Download

DescriptionNameSize
Derby SQL scripts for this articlederby.build.zip1KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Information Management
ArticleID=125619
ArticleTitle=Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 4
publish-date=06062006