Ready to dive in? This article tackles several advanced database concepts, including grouping SQL query results, database views, and database indexes, which you'll use in conjunction with the Apache Derby database. To follow along and complete the examples in this article, you need a properly initialized Apache Derby test database. If you've been completing the examples from the previous articles in this series, you already have a test database. If not, or if you want a clean install -- which will guarantee you can follow along exactly with the examples in this article -- you can download and run the SQL script included with this article, as shown in Listing 1.
Listing 1. Setting up the Derby work environment
rb% mkdir derbyWork rb% cp derby8.zip derbyWork/ rb% cd derbyWork/ rb% unzip derby8.zip Archive: derby8.zip inflating: derby.build.sql rb% ls derby.build.sql derby8.zip rb% java org.apache.derby.tools.ij < derby.build.sql > derby.build.out 2> derby.build.err rb% java org.apache.derby.tools.ij ij version 10.1 ij> connect 'jdbc:derby:test' ; ij> |
Listing 1 starts by creating a new work directory, called derbyWork, where you'll create the test database. After
extracting the SQL script file and copying it into the derbyWork directory, you process the
derby.build.sql script file by using the ij tool as demonstrated in
"Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 3" (developerWorks, May 2006).
This example concludes by starting the ij tool and issuing the
connect command to establish a connection to the test database. If you
get an error, such as ERROR XJ004: Database 'test' not found., the SQL script file
must have failed. To diagnose the problem, look at both the derby.build.out and the
derby.build.err files, and consult either the second article in this series or the online Derby
documentation, listed in the Resources section of this article. If everything worked as expected,
you should be presented with the ij> prompt, and you can
advance to issuing SQL queries that return aggregations rather than individual rows.
In the last several articles in this series (see Resources for more details, including links), you constructed different types of SQL queries. These range from simple data dumps, where every column in a single table is listed, to queries that return values and the result of built-in functions from table joins. In all of these queries, however, the actual rows, or the results of executing built-in functions on the actual rows of one or more database tables, were returned by the SQL query.
In some cases, this result may not be what you want or need. For example, joining multiple database tables together can quickly produce large query results; or perhaps you don't want all the data in the table, but just a statistical
summary -- perhaps sales by month or expenditures by quarter. Or maybe you just want the
SQL query to group related rows and return the group results rather than all the
underlying rows. Using SQL, you can achieve this result by adding a GROUP BY clause to the end of your SQL query, along with one or more
columns that indicate how the underlying rows should be grouped.
This technique can be powerful, especially when combined with aggregate functions, because you can use it
to quickly generate summary statistics for groups of related rows. When this series first
covered SQL queries in Part
4, it discussed the order in which Apache Derby processes the various components of an SQL
query: FROM, WHERE, GROUP BY, HAVING, and finally, SELECT. Thus, before any columns or function results are selected in an
SQL query, the rows that satisfy the WHERE clause are grouped
together following the directions in the GROUP BY clause. Listing 2 provides
an explicit demonstration of these rules; it computes the
average price and number of items provided by each of the vendors that supply Bigdog's
Surf Shop.
Listing 2. Grouping results by using the GROUP BY and HAVING clauses
ij> SELECT COUNT(p.itemNumber) AS Number, AVG(p.price) AS Average
FROM bigdog.products AS p, bigdog.vendors AS v
WHERE p.itemNumber = v.itemNumber
GROUP BY v.vendorNumber ;
NUMBER |AVERAGE
-----------------------
6 |31.4633
2 |39.9500
2 |23.9500
3 rows selected
ij> SELECT v.vendorNumber,
COUNT(p.itemNumber) AS Number, AVG(p.price) AS Average
FROM bigdog.products AS p, bigdog.vendors AS v
WHERE p.itemNumber = v.itemNumber
GROUP BY v.vendorNumber
HAVING v.vendorNumber > 1 ;
VENDORNUMB&|NUMBER |AVERAGE
-----------------------------------
2 |2 |39.9500
3 |2 |23.9500
2 rows selected
ij> |
The first query in this example joins the bigdog.products table
with the bigdog.vendors table by using the itemNumber column. Before selecting any rows from the newly joined table,
however, you first group related rows by using the GROUP BY
v.vendorNumber clause. The GROUP BY clause doesn't simply
sort the rows that result from your query, it groups them together; and only the
grouped data can be returned or used in an aggregate function. In other words, you can only select
the columns listed in the GROUP BY clause, the values from
aggregate functions that operate on any column in the joined tables, constants, or special registers.
The second query extends the first by adding a HAVING clause. You can use a HAVING clause to restrict the operations of a GROUP BY clause, by limiting the allowed column values that can be used
to group related rows. In this example, only rows with vendorNumber greater than 1 are grouped, which results in only the two
groups shown. As is the case with the GROUP BY clause, a HAVING clause can take multiple columns; but any columns listed in the
HAVING clause must also be listed in the GROUP BY
clause, or an error is issued by the Apache Derby database engine.
When you work with a relational database like Apache Derby, the fundamental structure you
generally deal with is a database table, which is composed of rows of data. By using SQL queries,
you can insert, select, update, or delete data from one or more tables. Sometimes, these queries --
especially SELECT queries -- can become complicated and may be used
frequently. In this case, you may want to have the database treat the results of the query as a
new table, which lets you more easily work with the query results.
The simplest way to achieve this result is to create a new table and use the INSERT INTO SQL statement together with your query of interest, to
explicitly create the new table and populate it with the appropriate data, as shown in Part
6 of this series. Although this technique is straightforward, it has several disadvantages. First, you have duplicated data within your database, which requires more resources. Second, and perhaps more important, is that if the original data
changes, the duplicated data will become stale and will need to be updated. Given the importance
of this concept, there must be a better way.
Not surprisingly, there is a better way, and it is called a view. A view is a virtual table that is created by executing a query. Creating and deleting a view is simple, as shown in the formal syntax presented in Listing 3.
Listing 3. Formal syntax for working with a view
CREATE VIEW schema.viewName
[ ( columnName1 [, columnName2] * ) ]
AS sqlQuery ;
DROP VIEW schema.viewName ;
|
To create a view, you use the CREATE VIEW SQL statement. When creating
a new view, you can
either explicitly name the columns in the view (for example, columnName1, columnName2, and so on), or you can have the column names be
implicitly copied directly from the columns listed in the
SELECT clause of the SQL query used to construct the view. As was the
case with a table, a view should be assigned to a particular schema; otherwise it's assigned to
the default APP schema. To delete a view, you use the DROP VIEW statement along with the fully qualified name of the view. Both
of these view operations are demonstrated in Listing 4.
Listing 4. Creating and dropping a view in Apache Derby
ij> CREATE VIEW bigdog.vendorList (Name)
AS SELECT DISTINCT vendorName FROM bigdog.vendors ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.vendorList ;
NAME
------------------------------
Mikal Arroyo Incorporated
Quiet Beach Industries
Luna Vista Limited
3 rows selected
ij> DROP VIEW vendorList ;
ERROR X0X05: Table 'VENDORLIST' does not exist.
ij> DROP VIEW bigdog.vendorList ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.vendorList ;
ERROR 42X05: Table 'BIGDOG.VENDORLIST' does not exist.
ij> |
The first statement in Listing 4 creates a new view called vendorList in
the bigdog schema. This view consists of one column that is explicitly
named Name and that is populated by the distinct names in the vendorName column in the bigdog.vendors table.
Next, you issue a SELECT statement that queries the vendorList view, showing how you can use a view in many of the same ways you
would use a table.
Because you don't need this view any longer, you next try to drop it. The first SQL statement
you use doesn't include the fully qualified name of the view; in this case, the schema name
is missing, which results in an error. Once you properly apply the DROP
VIEW statement by using the fully qualified view name, bigdog.vendorList, the drop operation completes successfully. To
demonstrate that the view no longer exists, you try to re-execute the original SQL SELECT statement, which now fails because the view has been deleted.
Recall the definition of a view: It's a virtual table. So a view is nearly identical to a real table. For example, a view can be used in SQL queries, either directly or as part of a join. Because a view is virtual, it can't be used to directly modify the data it contains. However, any changes to the tables that hold the original data are automatically propagated to the data in the view. This last benefit of working with views is displayed in Listing 5.
Listing 5. Working with views in Apache Derby
ij> CREATE VIEW bigdog.inventory AS SELECT p.price, p.description AS "Item",
v.vendorName AS "Vendor"
FROM bigdog.products AS p, bigdog.vendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 40.00 ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.inventory ;
PRICE |Item |Vendor
--------------------------------------------------------------------------------
99.99 |Beach umbrella |Luna Vista Limited
49.95 |Female bathing suit, one piece, aqua |Mikal Arroyo Incorporated
2 rows selected
ij> UPDATE bigdog.products SET price = 44.95 WHERE itemNumber = 8 ;
1 row inserted/updated/deleted
ij> SELECT * FROM bigdog.inventory ;
PRICE |Item |Vendor
--------------------------------------------------------------------------------
99.99 |Beach umbrella |Luna Vista Limited
49.95 |Female bathing suit, one piece, aqua |Mikal Arroyo Incorporated
44.95 |Blue-stripe beach towel |Luna Vista Limited
3 rows selected
ij> DROP VIEW bigdog.inventory ;
0 rows inserted/updated/deleted
ij> |
This example first creates a new view, bigdog.inventory, that
results from a join between two tables. This new view contains an inventory listing for all
products that have a sales price of more than US$40. Notice that this view has columns that are
named according to the column names listed in the SELECT clause of the query
that creates the view. You then verify the contents of this new view by issuing a query that
extracts all rows from the view.
The next statement increases the price of one product in the bigdog.products table, which was one of the base tables used to create the
inventory view. By re-executing the SELECT
query, you now see that the view contains three items, which demonstrates the dynamic nature of a
view in Apache Derby. Finally, you drop the view from the database because it's no longer needed. If you
want to rename a view, you're out of luck -- it can't be done at this time in Apache Derby.
Instead, you must drop the original view and create a new view with the correct name.
As this example has demonstrated, a view is a powerful tool that can simplify the task of developing and maintaining database applications. The next section introduces indexes, which you can use to speed up query execution on both tables and views.
The simple query used in Listing 5, SELECT * FROM bigdog.inventory ;,
extracts all rows from the inventory view. To extract the necessary
data, this type of query performs what is known as a full table scan, which means every row in the
table (or view in this case) is scanned and processed. Scanning a table involves pulling the data
off the disks and into the computer memory where the row data is analyzed. For small tables like those
in this series of articles, this processing doesn't result in a significant performance
degradation. For large tables, however, scanning can become an expensive operation
that should be avoided whenever possible. You may think that including a WHERE clause will help, because it
results in a smaller data set being extracted; however, this still requires every row in the
table to be searched to find the relevant matching rows.
The problem is analogous to looking for specific passages or information in a book. You can start at the front and scan through the book, marking all relevant pages until you reach the end. Or, you can look in the book's index to quickly locate the relevant pages. It's usually much faster to look in the index. Fortunately, a similar construct is available in Apache Derby. The formal syntax for creating, dropping, and renaming an index is presented in Listing 6.
Listing 6. The formal syntax for Apache Derby SQL index operations
CREATE [UNIQUE] INDEX schema.indexName
ON schema.tableName ( columnName [ ASC | DESC ]
[ , columnName [ ASC | DESC ]] * ) ;
DROP INDEX schema.indexName ;
RENAME INDEX indexName TO newIndexName ;
|
These three statements all follow a simple formal syntax. First, the CREATE
INDEX statement creates a new index on a specific table. The index name is limited to 128
characters, and in Apache Derby this name must be unique within a given schema. When building a new
index, you can specify one or more
columns up to a maximum of 16, but any given column can be used only once per index. By default, the index is
built in ascending order for every column, but you can use the DESC keyword to specify that a particular column should
be used in descending order instead. The UNIQUE keyword specifies that
a table can't have multiple rows that each have the same index value. In other words, every index
value maps to a single table row, which can be used along with primary keys to enforce data
integrity within the table.
To remove an index, you use the DROP INDEX statement along with
the fully qualified index name. On the other hand, you can rename an index by using the RENAME INDEX statement only if it's in the
current schema. Thus you must either work in the default schema at all times or, more
likely, use the SET SCHEMA statement, as shown in Listing 7.
Listing 7. Working with indexes in Apache Derby
ij> SET SCHEMA bigdog ; 0 rows inserted/updated/deleted ij> CREATE INDEX productsIndex ON products(itemNumber) ; 0 rows inserted/updated/deleted ij> RENAME INDEX productsIndex TO pi ; 0 rows inserted/updated/deleted ij> DROP INDEX pi ; 0 rows inserted/updated/deleted ij> |
The first statement in this example, SET SCHEMA bigdog ;, sets the
default schema for the current database connection to
be the bigdog schema. As a result, you no longer have to include the
bigdog schema name as part of the fully qualified names. The next step
is to create a new index, called productsIndex, on the products table, which you do by using the
itemNumber column with the default ascending order. Next, you rename
the productsIndex index to pi. Finally, you
drop the index by using a DROP INDEX statement. Notice how these SQL
statements are easier to write, because you don't need to always specify the schema name.
It may seem odd, but this article hasn't demonstrated using an index. The reason is simple: Whether an index is used in an SQL query is decided by the database engine. If an index is expected to improve the performance of a given query, it's automatically used. Given the small size of the tables in the example schema, any performance benefit of an index is minimal. For these small tables, using an index will almost certainly be slower, due to the overhead of accessing the index, than not.
To understand why, think about the book analogy again, but this time assume the book has only a
few pages of text. In this case, it will be faster to look through the pages than to locate the
index, find the relevant values in the index, and then find the pages of interest. The same is true
for database indexes. If a table is small, or if a large number of rows (say, more than 25% of the
rows in a table) will be returned by a query, using an index can slow the
query due to the overhead of searching an index for the relevant key values and then
locating the appropriate rows in the database table. Another important fact is that an index can
improve the performance of a query only if the columns used when building the index are also
included in the query's WHERE clause. Otherwise, a full table scan
must still be performed.
This article introduced three new database concepts: grouping query results, views, and indexes. In their own way, they each offer advanced functionality to enable more powerful queries, to simplify application development, or to improve the performance of existing queries. At this point in the Developing with Apache Derby series, you're acquainted with a number of fundamental database concepts that you should master before developing database applications. The next article will change gears and begin developing Java applications that leverage the Apache Derby database.
| Description | Name | Size | Download method |
|---|---|---|---|
| Derby SQL script for this article | derby.build.sql | 2KB | HTTP |
Information about download methods
Learn
- Check out the other articles in this series:
- "Developing with Apache Derby -- Hitting the Trifecta: Introduction to Apache Derby" (developerWorks, February 2006) introduces the Apache Derby database and provides the foundation for many topics in this series.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 1" (developerWorks, March 2006) introduces the
ijtool and demonstrates how to use it to connect to an Apache Derby database. - "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part
2" (developerWorks, April 2006) introduces the concepts of a database schema and relational database tables and demonstrates how to create basic
database objects by using the SQL
CREATEstatement. - "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part
3" (developerWorks, May 2006) introduces the concept of executing SQL scripts with Apache Derby and demonstrates how to
insert data into tables in a Derby database by using the SQL
INSERTstatement. - "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part
4" (developerWorks, June 2006) introduces the concept of an SQL query and demonstrates how to extract data from a Derby database by using the SQL
SELECTstatement. - "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 5" (developerWorks, July 2006) builds on the fifth article in this series to demonstrate how to write more powerful SQL queries, including those that use the SQL functions supported by the Apache Derby database.
- "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 6" (developerWorks, August 2006) discusses how to delete and modify data within an existing table and how to modify the schema of an existing table.
- Access Apache Derby online manuals for more detailed information about how to use the Derby database.
- Learn how to
download and install Apache Derby.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM® products.
- Check out the developerWorks Apache Derby project area for articles, tutorials, and other resources to help you get started with Derby today.
- Browse all the Apache articles and free Apache tutorials available in the developerWorks Open source zone.
- Browse for books on these and other technical topics at the Safari bookstore.
Get products and technologies
-
Download Apache Derby from the Apache Derby Project home page.
- Innovate your next open source development project with IBM trial software, available for download or on DVD.
Discuss
- Get involved in the developerWorks community by participating in developerWorks blogs.





