Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 5

Write more powerful queries

The SQL SELECT statement lets you perform queries against an Apache Derby database to select data from multiple tables where certain conditions are valid. This article builds on previous articles in this series to develop even more powerful SELECT statements. Learn how to use the ORDER BY clause and DISTINCT keywords to modify the rows of data selected by a query; how to include basic mathematical operators in a query; and how to work with the primary built-in SQL functions to convert data from one data type to another, to compute aggregate quantities, to perform mathematical operations, to work with date and time data, and to work with character strings.

Robert Brunner, NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign

Robert J. BrunnerRobert J. Brunner is a Research Scientist at the National Center for Supercomputing Applications and an Assistant Professor of Astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics. You can reach him at rb@ncsa.uiuc.edu.



11 July 2006

Also available in Russian Japanese

Database queries

The previous article, "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 4," introduced the fundamental concepts related to selecting data from an Apache Derby database. These included the SELECT, FROM, and WHERE clauses, and the concept of joining two tables together to facilitate more advanced data selection. That article also discussed how to initialize your Derby workspace by using the included SQL script file to build and populate the Apache Derby database demonstrated in the example code listings. If you haven't already done so, you should read the previous article and initialize your Derby workspace by running the SQL script available for download with that article.

This article builds on these basic SQL concepts to facilitate more advanced data-selection operations, including the use of basic mathematical operations and SQL functions within a query. You can use these operations and functions in the WHERE clause to strengthen the data restrictions in your query or apply them in the SELECT clause to extract computed values. First, however, the article discusses several additional SQL query keywords that you can use to modify the results of your query. Start Derby's ij tool, connect to your test database, and follow along to learn more about writing SQL queries with the Apache Derby database.


Modify the selection of data

All the queries presented in the previous article were straightforward, selecting a set of columns from all rows in a table, from a simple join of two tables, or from a query that included a simple WHERE clause. Although useful, this functionality corresponds to performing a data dump. Fortunately, SQL provides a number of techniques to filter both the rows selected from a table and the data extracted. The rest of this section discusses two basic techniques for controlling the data extracted by a query: the DISTINCT keyword and the ORDER BY clause.

Extract distinct rows

By default, when data is selected by using an SQL query, all rows that satisfy the WHERE clause are extracted from the database. In some cases, this may result in rows that have identical column values being returned. If you need to restrict your query so that only unique row values are returned, you can use the DISTINCT qualifier, as shown in Listing 1.

Listing 1. Using the DISTINCT qualifier in a query
rb$ java org.apache.derby.tools.ij    
ij version 10.1
ij> connect 'jdbc:derby:test' ;
ij> SELECT DISTINCT vendorNumber AS "Vendor #" FROM bigdog.vendors ;
Vendor #   
-----------
3          
2          
1          

3 rows selected

ij> SELECT DISTINCT vendorNumber AS "Vendor #", itemNumber as "Item #" 
    FROM bigdog.vendors WHERE itemNumber > 5 ;
Vendor #   |Item #     
-----------------------
1          |6          
1          |7          
1          |8          
3          |9          
3          |10         

5 rows selected

ij>

The 411 on DISTINCT

If you want to use the DISTINCT qualifier, it must be the first item listed in the SELECT clause, as shown in Listing 1, and you can have only one DISTINCT qualifier per SELECT clause. If the selected rows contain a column with NULL values, multiple NULL values are considered duplicates when identifying unique rows.

The first query in this listing uses the DISTINCT qualifier to restrict the output of the query to only distinct, or unique, values of the vendorNumber column, which is the only column listed in the SELECT clause. In the example schema that these articles use, there are only three vendors (with vendorNumber being restricted to 1, 2, or 3). Thus, when the DISTINCT qualifier is used in the query, only three rows are selected.

The DISTINCT qualifier, however, applies to the entire list of selected columns, so if multiple columns are listed following a DISTINCT keyword, only unique combinations of all the columns are selected. This is demonstrated in the second example, where both vendorNumber and itemNumber are listed in the SELECT clause. Because every item has a unique itemNumber, every combination of these two columns is unique, and all rows that satisfy the WHERE clause are selected -- in other words, the DISTINCT qualifier has no effect on the results.

One remaining point that you may have noticed from the examples in Listing 1 is that the selected rows were not in the same order. If the order of selected rows is important, you can easily control it by using an ORDER BY clause in your query, as shown in next section.

Ordering rows

In general, you can't assume that Apache Derby, or any database, will return rows from a query in a specific order. If the order is important, you can use the ORDER BY clause to have Apache Derby order the data that are returned by your query in a particular manner. Generally, you do so by specifying a column that should be used to provide the ordinal values for comparison, as shown in Listing 2.

Listing 2. Using the ORDER BY clause in a query
ij> SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
        p.price as "Price", p.itemNumber AS "Item #"
    FROM bigdog.products AS p, bigdog.vendors AS v
    WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
    ORDER by v.vendorNumber ;
Vendor #   |Vendor                        |Price   |Item #     
---------------------------------------------------------------
1          |Luna Vista Limited            |32.95   |8          
1          |Luna Vista Limited            |24.95   |7          
1          |Luna Vista Limited            |99.99   |2          
2          |Mikal Arroyo Incorporated     |49.95   |5          
2          |Mikal Arroyo Incorporated     |29.95   |4          
3          |Quiet Beach Industries        |34.95   |10         

6 rows selected

       
ij> SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
        p.price as "Price", p.itemNumber AS "Item #"
    FROM bigdog.products AS p, bigdog.vendors AS v
    WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
    ORDER BY v.vendorNumber ASC, p.price DESC ;
Vendor #   |Vendor                        |Price   |Item #     
---------------------------------------------------------------
1          |Luna Vista Limited            |99.99   |2          
1          |Luna Vista Limited            |32.95   |8          
1          |Luna Vista Limited            |24.95   |7          
2          |Mikal Arroyo Incorporated     |49.95   |5          
2          |Mikal Arroyo Incorporated     |29.95   |4          
3          |Quiet Beach Industries        |34.95   |10         

6 rows selected

ij>

Using column names: A best practice

Although using column numbers may seem like a handy shortcut, it generally isn't a good idea. To see why, consider what happens if you modify the columns listed in a SELECT clause or just modify their order. If you forget to modify the numbers used in the ORDER BY clause, the query will break -- or worse, return bad data. In general, it's a best practice to always be explicit and specify the column names directly, even if doing so means more typing.

In Listing 2, the first query uses the ORDER BY clause to list a subset of all the rows in the table that results from joining the bigdog.vendors table to the bigdog.products table. The rows are ordered by vendorNumber (the subset is constructed by applying the WHERE clause). An ORDER BY clause can take either a column name, as in this example, or a column number, which is taken from the order in which the columns are listed after the SELECT keyword.

You can also specify multiple columns to use during the sorting process and even specify ASC for ascending order, which is the default, or DESC for descending order. For example, if you used the ORDER BY 1 DESC, 4 DESC clause in the first query, the query would return the same rows, but they would be ordered by using the vendorNumber column as the primary sort column in descending order followed by the itemNumber column as the secondary sort column in descending order.


Do math in a query

Selecting columns from a database provides a number of useful benefits, but being able to compute and select quantities based on data in a table opens up even more possibilities. Apache Derby provides several mathematical operators, detailed in Table 1, that you can use in either a SELECT clause or a WHERE clause.

Table 1. Apache Derby mathematical operators
OperatorExampleDescription
unary ++1.0A noop, or no operation, as +4 = 4
unary --p.priceChanges the sign of the value to which it's applied
+p.itemNumber + 10Adds the second value to the first value
-p.itemNumber - 10Subtracts the second value from the first value
*p.price * 1.0825Multiplies the first value by the second value
/p.price / 100.0Divides the first value by the second value

Using these operators is straightforward because they generally behave exactly as you expect. For example, if the sales tax is 8.25%, you can return the price for an item both before and after sales tax has been applied by using SELECT price, price * 1.0825 FROM bigdog.products ;. As another example, if you have a column called numberItems that tracks the number of items purchased and another column called price that contains the price at which they're purchased, you can return the total amount paid for those items at a given price by using numberItems * price. Several of the queries shown in the code listings in the rest of this article provide additional examples of how to use these operators.

The only concern when using these operators arises from complications that result from using different data types, such as integer or floating-point, in a mathematical operation. If both operands are the same data type, the result type will be the same. If you're performing division, this can result in truncation (for example, if you're using two integer values), which might cause unexpected problems. You can handle this issue properly by using the CAST function described later in this article. On the other hand, if the two operands are different data types, the result type is promoted to the more complex type.


Use Derby's SQL functions

SQL is a powerful and expressive language that can be used to perform a wide range of actions. Part of the SQL language's power comes from its ability to directly interact with a variety of data types. This section discusses the SQL-92 standard built-in functions provided by Apache Derby, which you can use in your SQL statements to simplify many common data-manipulation tasks. For the purposes of this article, these functions are broken into five categories:

  • Type conversion
  • Aggregate
  • Math
  • Date and time
  • Character string

Type-conversion functions

The simplest type of function to use in Apache Derby is the type-conversion function. These functions can be used to explicitly convert one type of data, such as a character string, to another type of data, such as a date or integer, or vice versa. There are two main categories of type-conversion functions: the explicit type functions, detailed in Table 2, and the CAST function, which provides a more powerful approach to type conversion.

Table 2. Apache Derby type-conversion functions
FunctionExampleDescription
BIGINTBIGINT(123.45)Returns a 64-bit integer constant for a given character string or number.
CHARCHAR(123.45)Returns a fixed-length character representation, up to a maximum length of 254 bytes, of a given value, which must be one of the built-in Apache Derby types. An optional second argument can be supplied to specify the length of the character string.
DATEDate('2006-05-16')Returns a date representation of the input value.
DOUBLEDOUBLE(123.45)Returns a double-precision, floating-point representation of the input number or character string.
INTEGERINTEGER(123.45)Returns an integer constant for a given character string, date, time, or number.
SMALLINTSMALLINT(123.45)Returns a small integer constant for a given character string or number.
TIMETIME('12:24:30')Returns a time representation of the input value.
TIMESTAMPTIMESTAMP('2006-05-16','12:24:30')Returns a timestamp representation of the input value.
VARCHARVARCHAR(123.45)Returns a variable-length character representation, up to a maximum length of 32,672 bytes, of a given date, time, timestamp, or character-string value.

You can use the CAST function to change the length of a character string or convert one type of data to another, including data types that are not among those supported by the type-conversion functions listed in Table 2 (such as DECIMAL). In this manner, you can use CAST to make the columns selected by a query more readable or more compact or to preserve numerical precision in a mathematical expression.

Explicit type conversions: A best practice

So why bother, given that Derby often performs implicit type conversions automatically? The simple answer is that being explicit makes it easier to understand the logic of your code. When code is easier to understand, it's easier to maintain and is generally more robust against programming errors. Being explicit when writing SQL queries is always a best practice.

When you're extracting results from a database within a Java™ program, however, you also have the option of converting data types within the application. Both methods provide the same result, but you gain several performance benefits by having the database cast the values, because you can minimize the network traffic by having the database send only the actual data required and push any processing to the database engine and away from a potential thin client. This approach may overburden a database server if taken to an extreme, so you should use it only when warranted. Using these type-conversion functions is straightforward, as demonstrated in Listing 3.

Listing 3. Using the CAST operator
ij> SELECT p.price AS "Price", 
        DOUBLE(p.price * 1.0825) AS "Float Total", 
        CAST(p.price * 1.0825 AS DECIMAL(6,2)) AS "Total",
        CAST(p.description AS CHAR(20)) AS "Description"
    FROM bigdog.products AS p 
    WHERE p.price > 21.0 AND p.stockDate < Date('2006-01-21') ;
Price   |Float Total           |Total    |Description         
--------------------------------------------------------------
24.95   |27.008375             |27.00    |White beach towel   
32.95   |35.668375             |35.66    |Blue-striped beach to

2 rows selected

The single query shown in Listing 3 demonstrates how to properly use several type-conversion functions. The second line of this query converts the product of the selected price column and the numerical factor of 1.0875, which applies a tax of 8.75% to the price of an item, to a DOUBLE precision floating-point number. The third and fourth lines use the CAST function to first convert this same product to a DECIMAL type and then truncate the selected description column to display only 20 characters. Notice that the CAST function truncated the calculated price when converting the value to a DECIMAL data type. Finally, the DATE function converts a date character string to a DATE data type. Although this step isn't required, being explicit is a best practice (see the sidebar on explicit types).

This query also demonstrates the use of the AS clause to rename columns in a query. This is important because the query uses functions in place of column names. When Apache Derby doesn't have an explicit name for a column, it automatically uses numbers, which provide little insight into the nature of the data. By explicitly providing names for all columns, you avoid this problem. In addition, because these names include white space, you must enclose them in double quotation marks to indicate to the ij tool that they are character-string constants.

Compute aggregate values

Apache Derby also provides functions that operate on multiple rows, known as aggregate functions, which you can use to enhance a query. Aggregate functions -- also known as set functions in SQL-92 or, more informally, as column functions -- return a computed quantity from a column over a number of rows. Apache Derby supports five aggregate functions, which are detailed in Table 3.

Table 3. Apache Derby aggregate functions
FunctionExampleDescription
AVGAVG(p.price)Returns the average value of a column from all rows that satisfy an expression. Can only be used with built-in numeric data types. The precision of the returned value is defined by the precision of the column being evaluated.
COUNTCOUNT(p.price)Returns the number of rows that satisfy an expression, such as a query. Can be used with any data type.
MAXMAX(p.price)Returns the maximum value of a column from all rows that satisfy an expression. Can only be used with built-in data types.
MINMIN(p.price)Returns the minimum value of a column from all rows that satisfy an expression. Can only be used with built-in data types.
SUMSUM(p.price)Returns the sum of a column over all rows that satisfy an expression. Can only be used with built-in numeric data types.

These aggregate functions can often be used to quickly find useful information that might otherwise be difficult to identify, as shown in Listing 4.

Listing 4. Using aggregate functions in a query
ij> SELECT COUNT(p.itemNumber) AS Number, 
        CAST((AVG(p.price) + 0.005) AS DECIMAL(5,2)) AS Average,
        MIN(p.stockDate) AS "First Date", MAX(p.stockDate) AS "Last Date"
    FROM bigdog.products AS p ;
NUMBER     |AVERAGE    |First Date|Last Date 
---------------------------------------------
10         |31.66      |2005-12-20|2006-03-31

1 row selected

Listing 4 uses four of the five aggregate functions to get summary information about the data in the bigdog.products table. The COUNT function indicates that the table includes ten rows (because the query didn't use a WHERE clause to restrict the rows selected from the table). The AVG function calculates the average price of all items in the bigdog.products table. Because this calculation results in a floating-point number, the query explicitly casts the result back to the suitably sized DECIMAL data type. Unlike in Listing 3, this example adds 0.005 to force the CAST function to properly round the calculated value. Finally, the MIN and MAX functions extract the minimum and maximum dates from the bigdog.products table.

Math functions

Apache Derby provides three built-in functions, detailed in Table 4, that let you perform more complex mathematical operations in a SQL query, especially when used in concert with the mathematical operators detailed earlier in this article. These functions can be used in either the SELECT or the WHERE clause of a query.

Table 4. Apache Derby math functions
FunctionExampleDescription
ABS or ABSVALABS(-1.0)Returns the absolute value of an expression, which must be one of the built-in numeric types. The return type is the same as the argument.
MODMOD(1, 2)Returns the remainder when the first argument is divided by the second argument. The return type is the same as the argument with the biggest integer type (SMALLINT, INTEGER, or BIGINT). The sign of the result is determined solely from the sign of the first argument.
SQRTSQRT(0.5)Returns the square root of an expression, which must be a floating-point value. The return type is the same as the argument.

Date and time functions

Apache Derby provides considerable support for date and time data types. To simplify using these types in a database query, there are six date and time functions, which are detailed in Table 5. These functions often provide an alternative to using the type-conversion functions described earlier. For example, you can replace the p.stockDate < Date('2006-01-21') part of the WHERE clause in the query shown in Listing 3 with the YEAR(p.stockDate) < 2006.

Table 5. Apache Derby date and time functions
FunctionExampleDescription
DAYDAY(p.stockDate)Returns an integer that contains the day component of a date, timestamp, or character string that contains a valid date
HOURHOUR('12:21:30')Returns an integer that contains the hour component of a time, timestamp, or character string that contains a valid time
MINUTEMINUTE('12:21:30')Returns an integer that contains the minute component of a time, timestamp, or character string that contains a valid time
MONTHMONTH('2005-12-20')Returns an integer that contains the month component of a date, timestamp, or character string that contains a valid date
SECONDSECOND('12:21:30')Returns an integer that contains the second component of a time, timestamp, or character string that contains a valid time
YEARYEAR(p.stockDate) < 2006Returns an integer that contains the year component of a date, timestamp, or character string that contains a valid date

Character-string functions

The last category of functions in Apache Derby simplifies the task of working with character data. These eight functions, detailed in Table 6, can be used to count character values, convert character values to all uppercase or all lowercase, remove white space at the start or end of the string, find a specific substring, or select a substring.

The character values used by these functions can be database columns that are one of the character data types supported by the Apache Derby database, database columns that can be directly converted to a character data type, or constant character strings. For example, the query SELECT UCASE(p.description), UCASE('hello world'), UCASE(p.stockDate) FROM bigdog.products AS p ; returns three columns, which are all displayed as uppercase character strings.

Table 6. Apache Derby character-string functions
FunctionExampleDescription
||(p.description||v.vendorName)The concatenation operator combines two values into a new character string. If both values are of type CHAR, the result is also of type CHAR. If the values are of type VARCHAR or a numerical data type, which can be converted to type VARCHAR, the resulting type is VARCHAR.
LCASE, or LOWERLCASE(p.description)Returns a character string in which all alphabetical characters in the input value have been converted to lowercase.
LENGTHLENGTH(p.description)Returns the number of characters in the input value. Noncharacter data is implicitly converted to a character string.
LOCATELOCATE('beach',p.description)Returns the starting location for the first occurrence of a substring in a search string, or zero if the substring isn't found. The first argument is the substring, the second argument is the search string, and an optional starting position can be supplied as a third argument.
RTRIMRTRIM(p.description)Returns a character string in which all spaces at the end of the input value have been removed.
LTRIMLTRIM(p.description)Returns a character string in which all spaces at the start of the input value have been removed.
SUBSTRSUBSTR(p.description, 1, 4)Returns part of an input character string as a VARCHAR, starting at the location specified and continuing until the end of the character string, or to the location specified by the optional third argument. If the starting location is positive, it's relative to the beginning of the string; if it's negative, it's relative to the end of the string.
UCASE, or UPPERUCASE(p.description)Returns a character string in which all alphabetical characters in the input value have been converted to uppercase.

Summary

This article introduced several techniques for writing more powerful SQL queries. First, the DISTINCT keyword was used to select only unique rows in a query. Second, the ORDER BY clause was used to sort the selected rows in a query by one or more criteria. Third, you read about the basic math operators that can be used in a query. And finally, you learned about the basic built-in SQL functions in detail. These functions can often be used to simplify complex SQL queries. These techniques will be further demonstrated in future articles where you'll use them to selectively delete or modify data in an existing database.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Information Management
ArticleID=145008
ArticleTitle=Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 5
publish-date=07112006