Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 7

Advanced database concepts

Robert Brunner (rb@ncsa.uiuc.edu), NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign
Robert J. Brunner
Robert J. Brunner is a research scientist at the National Center for Supercomputing Applications and an assistant professor of astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics.

Summary:  With this article you've reached an important point in mastering fundamental database concepts as they relate to the Apache Derby database. This article reviews how to modify an SQL query to group related rows together to provide summary statistics database information. Then it introduces the concept of a view, which can be used to simplify database application development by creating a virtual table that represents the results of an SQL query. Finally, you'll learn about database indexes, which you can use to locate specific table rows. After you've mastered these advanced database concepts, you'll be well positioned to begin developing Java™ database applications with Apache Derby.

View more content in this series

Date:  07 Nov 2006
Level:  Intermediate
Also available in:   Russian

Activity:  8584 views
Comments:  

Introduction

Ready to dive in? This article tackles several advanced database concepts, including grouping SQL query results, database views, and database indexes, which you'll use in conjunction with the Apache Derby database. To follow along and complete the examples in this article, you need a properly initialized Apache Derby test database. If you've been completing the examples from the previous articles in this series, you already have a test database. If not, or if you want a clean install -- which will guarantee you can follow along exactly with the examples in this article -- you can download and run the SQL script included with this article, as shown in Listing 1.


Listing 1. Setting up the Derby work environment
rb% mkdir derbyWork
rb% cp derby8.zip derbyWork/
rb% cd derbyWork/
rb% unzip derby8.zip 
Archive:  derby8.zip
  inflating: derby.build.sql         
rb% ls 
derby.build.sql derby8.zip
rb% java org.apache.derby.tools.ij < derby.build.sql > derby.build.out 2> derby.build.err 
rb% java org.apache.derby.tools.ij
ij version 10.1
ij> connect 'jdbc:derby:test' ;
ij> 

Listing 1 starts by creating a new work directory, called derbyWork, where you'll create the test database. After extracting the SQL script file and copying it into the derbyWork directory, you process the derby.build.sql script file by using the ij tool as demonstrated in "Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 3" (developerWorks, May 2006).

This example concludes by starting the ij tool and issuing the connect command to establish a connection to the test database. If you get an error, such as ERROR XJ004: Database 'test' not found., the SQL script file must have failed. To diagnose the problem, look at both the derby.build.out and the derby.build.err files, and consult either the second article in this series or the online Derby documentation, listed in the Resources section of this article. If everything worked as expected, you should be presented with the ij> prompt, and you can advance to issuing SQL queries that return aggregations rather than individual rows.


Grouping results

In the last several articles in this series (see Resources for more details, including links), you constructed different types of SQL queries. These range from simple data dumps, where every column in a single table is listed, to queries that return values and the result of built-in functions from table joins. In all of these queries, however, the actual rows, or the results of executing built-in functions on the actual rows of one or more database tables, were returned by the SQL query.

In some cases, this result may not be what you want or need. For example, joining multiple database tables together can quickly produce large query results; or perhaps you don't want all the data in the table, but just a statistical summary -- perhaps sales by month or expenditures by quarter. Or maybe you just want the SQL query to group related rows and return the group results rather than all the underlying rows. Using SQL, you can achieve this result by adding a GROUP BY clause to the end of your SQL query, along with one or more columns that indicate how the underlying rows should be grouped.

This technique can be powerful, especially when combined with aggregate functions, because you can use it to quickly generate summary statistics for groups of related rows. When this series first covered SQL queries in Part 4, it discussed the order in which Apache Derby processes the various components of an SQL query: FROM, WHERE, GROUP BY, HAVING, and finally, SELECT. Thus, before any columns or function results are selected in an SQL query, the rows that satisfy the WHERE clause are grouped together following the directions in the GROUP BY clause. Listing 2 provides an explicit demonstration of these rules; it computes the average price and number of items provided by each of the vendors that supply Bigdog's Surf Shop.


Listing 2. Grouping results by using the GROUP BY and HAVING clauses
    
ij> SELECT COUNT(p.itemNumber) AS Number, AVG(p.price) AS Average
        FROM bigdog.products AS p, bigdog.vendors AS v
        WHERE p.itemNumber = v.itemNumber
        GROUP BY v.vendorNumber ;
NUMBER     |AVERAGE    
-----------------------
6          |31.4633    
2          |39.9500    
2          |23.9500    

3 rows selected
ij> SELECT v.vendorNumber, 
        COUNT(p.itemNumber) AS Number, AVG(p.price) AS Average
        FROM bigdog.products AS p, bigdog.vendors AS v
        WHERE p.itemNumber = v.itemNumber
        GROUP BY v.vendorNumber
        HAVING v.vendorNumber > 1 ;
VENDORNUMB&|NUMBER     |AVERAGE    
-----------------------------------
2          |2          |39.9500    
3          |2          |23.9500    

2 rows selected
ij> 

Grouping NULL values

Sometimes, the rows selected by your query contain NULL values. In this case, you may wonder what happens when you try to group rows by using a column that contains NULL values. The answer is simple and exactly what you'd expect, because NULL values are considered equivalent for grouping purposes.

The first query in this example joins the bigdog.products table with the bigdog.vendors table by using the itemNumber column. Before selecting any rows from the newly joined table, however, you first group related rows by using the GROUP BY v.vendorNumber clause. The GROUP BY clause doesn't simply sort the rows that result from your query, it groups them together; and only the grouped data can be returned or used in an aggregate function. In other words, you can only select the columns listed in the GROUP BY clause, the values from aggregate functions that operate on any column in the joined tables, constants, or special registers.

The second query extends the first by adding a HAVING clause. You can use a HAVING clause to restrict the operations of a GROUP BY clause, by limiting the allowed column values that can be used to group related rows. In this example, only rows with vendorNumber greater than 1 are grouped, which results in only the two groups shown. As is the case with the GROUP BY clause, a HAVING clause can take multiple columns; but any columns listed in the HAVING clause must also be listed in the GROUP BY clause, or an error is issued by the Apache Derby database engine.


A good view is priceless

When you work with a relational database like Apache Derby, the fundamental structure you generally deal with is a database table, which is composed of rows of data. By using SQL queries, you can insert, select, update, or delete data from one or more tables. Sometimes, these queries -- especially SELECT queries -- can become complicated and may be used frequently. In this case, you may want to have the database treat the results of the query as a new table, which lets you more easily work with the query results.

The simplest way to achieve this result is to create a new table and use the INSERT INTO SQL statement together with your query of interest, to explicitly create the new table and populate it with the appropriate data, as shown in Part 6 of this series. Although this technique is straightforward, it has several disadvantages. First, you have duplicated data within your database, which requires more resources. Second, and perhaps more important, is that if the original data changes, the duplicated data will become stale and will need to be updated. Given the importance of this concept, there must be a better way.

Not surprisingly, there is a better way, and it is called a view. A view is a virtual table that is created by executing a query. Creating and deleting a view is simple, as shown in the formal syntax presented in Listing 3.


Listing 3. Formal syntax for working with a view
CREATE VIEW schema.viewName
    [ ( columnName1 [, columnName2] * ) ]
AS sqlQuery ;

DROP VIEW schema.viewName ;

To create a view, you use the CREATE VIEW SQL statement. When creating a new view, you can either explicitly name the columns in the view (for example, columnName1, columnName2, and so on), or you can have the column names be implicitly copied directly from the columns listed in the SELECT clause of the SQL query used to construct the view. As was the case with a table, a view should be assigned to a particular schema; otherwise it's assigned to the default APP schema. To delete a view, you use the DROP VIEW statement along with the fully qualified name of the view. Both of these view operations are demonstrated in Listing 4.


Listing 4. Creating and dropping a view in Apache Derby
ij> CREATE VIEW bigdog.vendorList (Name)
    AS SELECT DISTINCT vendorName FROM bigdog.vendors ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.vendorList ;
NAME                          
------------------------------
Mikal Arroyo Incorporated     
Quiet Beach Industries        
Luna Vista Limited            

3 rows selected
ij> DROP VIEW vendorList ;
ERROR X0X05: Table 'VENDORLIST' does not exist.
ij> DROP VIEW bigdog.vendorList ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.vendorList ;
ERROR 42X05: Table 'BIGDOG.VENDORLIST' does not exist.
ij>  

The first statement in Listing 4 creates a new view called vendorList in the bigdog schema. This view consists of one column that is explicitly named Name and that is populated by the distinct names in the vendorName column in the bigdog.vendors table. Next, you issue a SELECT statement that queries the vendorList view, showing how you can use a view in many of the same ways you would use a table.

Because you don't need this view any longer, you next try to drop it. The first SQL statement you use doesn't include the fully qualified name of the view; in this case, the schema name is missing, which results in an error. Once you properly apply the DROP VIEW statement by using the fully qualified view name, bigdog.vendorList, the drop operation completes successfully. To demonstrate that the view no longer exists, you try to re-execute the original SQL SELECT statement, which now fails because the view has been deleted.

Recall the definition of a view: It's a virtual table. So a view is nearly identical to a real table. For example, a view can be used in SQL queries, either directly or as part of a join. Because a view is virtual, it can't be used to directly modify the data it contains. However, any changes to the tables that hold the original data are automatically propagated to the data in the view. This last benefit of working with views is displayed in Listing 5.


Listing 5. Working with views in Apache Derby
ij> CREATE VIEW bigdog.inventory AS SELECT p.price, p.description AS "Item",
    v.vendorName AS "Vendor"
    FROM bigdog.products AS p, bigdog.vendors AS v
    WHERE p.itemNumber = v.itemNumber AND p.price > 40.00 ;
0 rows inserted/updated/deleted
ij> SELECT * FROM bigdog.inventory ;
PRICE   |Item                                    |Vendor                        
--------------------------------------------------------------------------------
99.99   |Beach umbrella                          |Luna Vista Limited            
49.95   |Female bathing suit, one piece, aqua    |Mikal Arroyo Incorporated     

2 rows selected
ij> UPDATE bigdog.products SET price = 44.95 WHERE itemNumber = 8 ;
1 row inserted/updated/deleted
ij> SELECT * FROM bigdog.inventory ;
PRICE   |Item                                    |Vendor                        
--------------------------------------------------------------------------------
99.99   |Beach umbrella                          |Luna Vista Limited            
49.95   |Female bathing suit, one piece, aqua    |Mikal Arroyo Incorporated     
44.95   |Blue-stripe beach towel                 |Luna Vista Limited            

3 rows selected
ij> DROP VIEW bigdog.inventory ;
0 rows inserted/updated/deleted
ij> 

This example first creates a new view, bigdog.inventory, that results from a join between two tables. This new view contains an inventory listing for all products that have a sales price of more than US$40. Notice that this view has columns that are named according to the column names listed in the SELECT clause of the query that creates the view. You then verify the contents of this new view by issuing a query that extracts all rows from the view.

The next statement increases the price of one product in the bigdog.products table, which was one of the base tables used to create the inventory view. By re-executing the SELECT query, you now see that the view contains three items, which demonstrates the dynamic nature of a view in Apache Derby. Finally, you drop the view from the database because it's no longer needed. If you want to rename a view, you're out of luck -- it can't be done at this time in Apache Derby. Instead, you must drop the original view and create a new view with the correct name.

As this example has demonstrated, a view is a powerful tool that can simplify the task of developing and maintaining database applications. The next section introduces indexes, which you can use to speed up query execution on both tables and views.


Going faster with indexes

The simple query used in Listing 5, SELECT * FROM bigdog.inventory ;, extracts all rows from the inventory view. To extract the necessary data, this type of query performs what is known as a full table scan, which means every row in the table (or view in this case) is scanned and processed. Scanning a table involves pulling the data off the disks and into the computer memory where the row data is analyzed. For small tables like those in this series of articles, this processing doesn't result in a significant performance degradation. For large tables, however, scanning can become an expensive operation that should be avoided whenever possible. You may think that including a WHERE clause will help, because it results in a smaller data set being extracted; however, this still requires every row in the table to be searched to find the relevant matching rows.

The problem is analogous to looking for specific passages or information in a book. You can start at the front and scan through the book, marking all relevant pages until you reach the end. Or, you can look in the book's index to quickly locate the relevant pages. It's usually much faster to look in the index. Fortunately, a similar construct is available in Apache Derby. The formal syntax for creating, dropping, and renaming an index is presented in Listing 6.


Listing 6. The formal syntax for Apache Derby SQL index operations
CREATE [UNIQUE] INDEX schema.indexName
ON schema.tableName ( columnName [ ASC | DESC ]
    [ , columnName [ ASC | DESC ]] * ) ;

DROP INDEX schema.indexName ;
    
RENAME INDEX indexName TO newIndexName ;  
  

These three statements all follow a simple formal syntax. First, the CREATE INDEX statement creates a new index on a specific table. The index name is limited to 128 characters, and in Apache Derby this name must be unique within a given schema. When building a new index, you can specify one or more columns up to a maximum of 16, but any given column can be used only once per index. By default, the index is built in ascending order for every column, but you can use the DESC keyword to specify that a particular column should be used in descending order instead. The UNIQUE keyword specifies that a table can't have multiple rows that each have the same index value. In other words, every index value maps to a single table row, which can be used along with primary keys to enforce data integrity within the table.

To remove an index, you use the DROP INDEX statement along with the fully qualified index name. On the other hand, you can rename an index by using the RENAME INDEX statement only if it's in the current schema. Thus you must either work in the default schema at all times or, more likely, use the SET SCHEMA statement, as shown in Listing 7.


Listing 7. Working with indexes in Apache Derby
ij> SET SCHEMA bigdog ;
0 rows inserted/updated/deleted
ij> CREATE INDEX productsIndex ON products(itemNumber) ;
0 rows inserted/updated/deleted
ij> RENAME INDEX productsIndex TO pi ;
0 rows inserted/updated/deleted
ij> DROP INDEX pi ;
0 rows inserted/updated/deleted
ij> 

The first statement in this example, SET SCHEMA bigdog ;, sets the default schema for the current database connection to be the bigdog schema. As a result, you no longer have to include the bigdog schema name as part of the fully qualified names. The next step is to create a new index, called productsIndex, on the products table, which you do by using the itemNumber column with the default ascending order. Next, you rename the productsIndex index to pi. Finally, you drop the index by using a DROP INDEX statement. Notice how these SQL statements are easier to write, because you don't need to always specify the schema name.

It may seem odd, but this article hasn't demonstrated using an index. The reason is simple: Whether an index is used in an SQL query is decided by the database engine. If an index is expected to improve the performance of a given query, it's automatically used. Given the small size of the tables in the example schema, any performance benefit of an index is minimal. For these small tables, using an index will almost certainly be slower, due to the overhead of accessing the index, than not.

To understand why, think about the book analogy again, but this time assume the book has only a few pages of text. In this case, it will be faster to look through the pages than to locate the index, find the relevant values in the index, and then find the pages of interest. The same is true for database indexes. If a table is small, or if a large number of rows (say, more than 25% of the rows in a table) will be returned by a query, using an index can slow the query due to the overhead of searching an index for the relevant key values and then locating the appropriate rows in the database table. Another important fact is that an index can improve the performance of a query only if the columns used when building the index are also included in the query's WHERE clause. Otherwise, a full table scan must still be performed.


Summary

This article introduced three new database concepts: grouping query results, views, and indexes. In their own way, they each offer advanced functionality to enable more powerful queries, to simplify application development, or to improve the performance of existing queries. At this point in the Developing with Apache Derby series, you're acquainted with a number of fundamental database concepts that you should master before developing database applications. The next article will change gears and begin developing Java applications that leverage the Apache Derby database.



Download

DescriptionNameSizeDownload method
Derby SQL script for this articlederby.build.sql2KB HTTP

Information about download methods


Resources

Learn

Get products and technologies

Discuss

About the author

Robert J. Brunner

Robert J. Brunner is a research scientist at the National Center for Supercomputing Applications and an assistant professor of astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Information Management
ArticleID=171435
ArticleTitle=Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 7
publish-date=11072006
author1-email=rb@ncsa.uiuc.edu
author1-email-cc=rb@ncsa.uiuc.edu

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers