Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 2

Schemas

Learn about several basic database concepts, including schemas, tables, and column data types, and get a simple introduction to Structured Query Language (SQL). This article -- focusing on the database developer role -- presents the basic data types you can use to store data in an Apache Derby database, and then you'll use them to create a simple schema with two tables in Apache Derby for a fictitious store. To view the schema contents of a database, you'll use the Apache Derby tool, dblook, to dump the contents of the database. The article wraps up with a brief discussion on dropping tables.

Robert Brunner, NCSA Research Scientist, Assistant Professor of Astronomy, University of Illinois, Urbana-Champaign

Robert J. BrunnerRobert J. Brunner is a Research Scientist at the National Center for Supercomputing Applications and an Assistant Professor of Astronomy at the University of Illinois, Urbana-Champaign. He has published several books and a number of articles and tutorials on a range of topics. You can reach him at rb@ncsa.uiuc.edu.



18 April 2006

Also available in Russian Japanese

The basics of relational database systems

Before you can begin to develop database applications, you need to understand the basic concepts. This section presents the data types available in Apache Derby and the rules that affect your ability to design and create useful Derby database applications.

Relational databases hold data. This data can be of different types, such as numbers, characters, or dates. Within the database, the data are organized into logical units called tables. A table is like a spreadsheet, because it contains rows of data. Each row is made up of a number of columns. The columns hold data of a specific data type, like integer values or strings of characters. In most cases, a database has more than one table. To relate the tables together, a database designer takes advantage of natural (or artificial) links between the tables. In a spreadsheet, you can link rows in different sheets by cell values. The same idea holds in a relational database, and the column used to make the link is referred to as a key column.

To make it easier to understand the purpose of a table or a particular column, you should select appropriate names. The naming conventions can vary between databases. For the Apache Derby database system, individual names:

  • Are case insensitive.
  • Are limited to 128 characters in length.
  • Must begin with a letter.
  • Must only contain Unicode letters, underscore characters, and Unicode digits.

You can escape these rules by placing the name in double quotation marks, which allows names to be case sensitive and to include additional characters (including spaces). Doing this, however, is generally a bad practice: It requires the name to always be enclosed in double quotation marks and can easily confuse someone else who may be maintaining your code.

Derby style

This series of articles follows a specific style: All SQL commands are presented entirely in uppercase, and item names use camelCase. In camelCase style, words are joined together, and the first letter of each word -- following the first one -- is capitalized, such as aLongIdentifier. Combining these two styles together, these articles write SQL commands using the following style: SELECT aLongIdentifier FROM bigdog.dataTable ;.

Related tables are often grouped together into a schema. You can think of a schema as a container for all the related structure definitions within a particular database. A table name must be unique within a given schema. Thus, by using schemas, you can have identically named objects (such as tables) enclosed within different schemas. When you're using the Apache Derby database, a table is always in a schema. If you don't specify a schema explicitly, Derby implicitly uses the built-in apps schema. A second built-in schema called sys is used to isolate system tables.

You can use the schema name to qualify a name. To do so, you specify the schema name followed by a period and then the table name. For example, bigdog.products references the products table in the bigdog schema. Without the relevant schema name, a table name is said to be unqualified, as in products. When the schema name and the table name are completely specified, as in bigdog.products, the name is said to be fully qualified.

In an abstract sense, these database concepts may seem confusing, but in practice they're fairly straightforward. For example, imagine you own a store called Bigdog's Surf Shop that sells a variety of items like sunglasses, shirts, and so on. If you want to be profitable, you must keep a close eye on your inventory so you can easily order additional inventory or change vendors to keep your overhead to a minimum. One simple method for tracking this information is to write entries in a table-like format, as shown in Figure 1.

Figure 1. A sample schema for Bigdog's Surf Shop
A sample schema for Bigdog's Surf Shop, showing a Products table and a Vendors table

From this simple visual design you can easily map the business logic straight into database tables. You have two database tables, Products and Vendors, which are naturally linked by the item number. The data types for the columns in each table are easy to determine. The rest of this article focuses on creating the sample schema for Bigdog's Surf Shop -- which consists of these two tables -- in a Derby database.


Work with a relational database: Structured Query Language

The SQL NULL type

Before you begin creating database tables, you must know what to do when no value is specified for a column. To illustrate this point, imagine that you've been asked to fill out a Web form. If you leave a particular column blank, what is inserted into the database? As you can imagine, this problem could be cumbersome if you had to track no value markers. Fortunately, SQL defines a special value, NULL, to indicate that a column has no value.

Database systems can be complex pieces of software, especially when they scale to support enterprise-level applications. As a result, you may expect that every database has its own application programming interface (API) and that these APIs may be different from one system to the next. When relational databases were first developed, this was the case; but, fortunately, a number of vendors agreed to develop a standard language for accessing and manipulating relational databases. This language is officially called Structured Query Language (or SQL, pronounced sea-quill). Several official standard versions have been produced, including one in 1992 that is referred to as SQL-92, and one in 1999 that is referred to as SQL-99. The Apache Derby database provides a nearly complete implementation of the SQL-92 standard, so applications developed with Derby can be easily transported to other database systems.

SQL has two main components: a Data Definition Language (DDL) and a Data Manipulation Language (DML). DDL commands are used to create, modify, or delete items (such as tables) in a database. DML commands are used to add, modify, delete, or select data from a table in the database. The rest of this article provides a basic introduction to the DDL components of SQL. Future articles will focus on the DML commands and more advanced DDL commands.

SQL data types

SQL, being a programming language in its own right, defines a rich data-type hierarchy. Persisting these data types is one of the most important responsibilities of the database. As databases have become more powerful, this type hierarchy has grown more complex. But most simple databases don't require the full range of allowed types, and often they need to store only numerical, character, and date or time data. In the interest of simplicity, Tables 1, 2, 3, and 4 present the basic SQL data types as implemented in Derby.

As shown in Table 1, Derby provides support for three different integer data types. These types are distinguished by the range of integers they can store and, therefore, by the amount of storage space they require in the database. One of the key issues to keep in mind when designing a database is to always try to minimize the amount of storage your table consumes. In general, smaller tables provide higher performance, but you must be able to store your data in the resulting table. For the record, 231 is equal to 2,147,483,648, and 263 is equal to 9,223,372,036,854,775,808, so using these data types lets you store very big integers!

Table 1. Basic integer data types in Derby
Data typeMinimum valueMaximum valueExampleDescription
SMALLINT-32768 (-215)32767 (215 - 1)itemNumber SMALLINT2-byte integer representation
INT-231231 - 1itemNumber INT4-byte integer representation
BIGINT-263263 - 1itemNumber BIGINT8-byte integer representation

Most numerical data can't be represented as integers. Derby provides support for real numbers in several formats: single-precision floating-point, double-precision floating-point, and an exact representation decimal, as presented in Table 2.

Table 2. Basic numerical data types in Derby
Data typeMinimum valueMaximum valueExampleDescription
REAL-3.402x10+383.402x10+38price REALIEEE floating-point number (4 bytes)
DOUBLE-1.79769x10+3081.79769x10+308price DOUBLEIEEE floating-point number (8 bytes)
DECIMAL31 (maximum precision)price DECIMAL(5,2)Exact decimal representation

If you've never encountered an exact precision data type, the distinction between a decimal and floating-point type may be confusing. The difference lies in the fact that floating-point data types used in computers can't hold every real number. This may seem odd, but recall that there are an infinite number of distinct real values. Most real numbers can't be stored in just a few bytes of memory. For some applications, this loss of precision is acceptable. In many cases, however, it isn't. For example, a financial application can't afford to lose money just because a particular number can't be stored in the computer.

The solution to this problem is to use the DECIMAL data type, which lets you control the total number of digits the computer stores (the precision) and the number of digits that follow the decimal point (the scale). To create a decimal type, you specify the precision and, optionally, the scale for the data to be stored. The storage required for a DECIMAL data type is potentially much larger than it is for a normal floating-point data type. Thus you should use this type carefully or the performance of your application may be reduced. By default, the scale for a DECIMAL type is 0, which means the DECIMAL data type mimics an integer type.

There are several synonyms for the numerical types. For instance, the DECIMAL data type can be shortened to DEC or referred to as NUMERIC. The DOUBLE type can also be referred to as DOUBLE PRECISION, although it isn't clear why you would want to type the extra word every time you need a double-precision number. A more general synonym is the FLOAT type, which has an arbitrary floating-point precision that you specify when declaring the data type as FLOAT(val). The precision must be a positive number less than 53; if it isn't, you get an error. If you specify the precision to be a value of 23 or less, the FLOAT(val) is equivalent to a REAL; if the precision is between 24 and 53, the FLOAT(val) is equivalent to a DOUBLE.

In addition to numerical data, the other most popular type of data to store in a database is character data. Examples of character data include an item description, a person's name, or address information. Derby provides two simple techniques for storing character data: the CHAR type and the VARCHAR type, which are detailed in Table 3. For both types, you can specify a length parameter, which defaults to 1 if no value is specified. There are two main differences between these two character data types. First, the CHAR type has a maximum length of 254 characters, whereas the VARCHAR type can hold up to 32,672 characters. The second difference is more subtle: The CHAR type is always the length you specify. If not enough characters are specified, extra spaces are inserted to fill the remaining spots. With a VARCHAR, the number of characters can vary, and no extra padding is performed.

Table 3. Basic character string data types in Derby
Data typeMaximum lengthExampleDescription
CHAR254description CHAR(128)Fixed-length character string
VARCHAR32,672description VARCHAR(128)Variable-length character string

Due to the varying widths, the VARCHAR type can be more efficient in terms of actual storage space but can also be less efficient when it comes to performance. The CHAR data type provides a performance boost because the database knows exactly how big every CHAR column will be and can make certain performance optimizations when reading or writing the data. The maximum length for a VARCHAR column may seem sufficiently large, but Derby provides larger character data types, which will be discussed in a future article.

The last simple class of data types Derby provides stores dates and times, as detailed in Table 4. The TIME data type stores hours, minutes, and seconds in 24-hour format (HH:MM:SS). The DATE data type stores the month, day, and year, which can be specified in a number of different formats, including the following:

  • yyyy-mm-dd
  • mm/dd/yyyy
  • dd.mm.yyyy
Table 4. Basic date and time data types in Derby
Data typeMinimum valueMaximum valueExampleDescription
TIME00:00:0024:00:00start TIMETime representation (second accuracy)
DATE0001-01-019999-12-31stockDate DATEDate representation (day accuracy)

Derby also provides a TIMESTAMP data type that combines the TIME and DATE data types into a single type for marking exact moments in time.


Create a table in Derby

So far, you've learned how to design a table, including mapping out the table columns and defining the data type for each column. After you've properly designed a table, the method for creating a table in SQL is straightforward. Listing 1 shows the formal syntax for creating a table in Derby.

Listing 1. CREATE TABLE syntax for Apache Derby

Click to see code listing

Listing 1. CREATE TABLE syntax for Apache Derby

-- Comment describing the purpose and layout of the table CREATE TABLE [schemaName.]tableName ( { <columnDefinition> | <tableLevelConstraint> } [, { <columnDefinition> | <tableLevelConstraint> } ]* ) ;

You may feel bewildered after looking at this syntax for the first time. But it's easy to follow once you have the basics down; and understanding the formal syntax is a must if you want to master Derby. The square brackets ([ and ]) enclose optional parameters. As you can see from the formal syntax, the schema name is optional, as are any column definitions or table-level constraints after the required initial one (it wouldn't make sense to create a table with no columns!).

You probably understand what is meant by a column definition, but you might not understand the idea of a constraint. Constraints come in two types: table-level constraints and column constraints. A constraint limits either a column or a table in some manner. For example, you can use a constraint to require that a column always be assigned an actual value (no NULL values), or that every entry in a column must be unique, or that a column is automatically assigned a default value. Constraints will be addressed in more detail in a future article.

The asterisk (*) after the last closing square bracket indicates that more than one of the enclosing items can be included. This implies that the table must have one or more columns or table-level constraints. The vertical line (|) indicates an either/or condition. In this syntax example, you must either define a new column or define a new table-level constraint. The curly brackets ({ and }) group related items together, and the parentheses (( and )) are required elements. Finally, the semicolon (;) indicates the end of a SQL statement.

Putting these rules into action is relatively straightforward. Listing 2 shows how to create the tables demonstrated earlier in Figure 1 using the ij tool that comes with Derby.

Listing 2. Creating a table in Apache Derby

Click to see code listing

Listing 2. Creating a table in Apache Derby

rb$ java org.apache.derby.tools.ij ij version 10.1 ij> connect 'jdbc:derby:test;create=true' ; ij> CREATE TABLE bigdog.products ( itemNumber INT NOT NULL, price DECIMAL(5, 2), stockDate DATE, description VARCHAR(128) ) ; 0 rows inserted/updated/deleted ij> CREATE TABLE bigdog.vendors ( itemNumber INT NOT NULL, vendorNumber INT NOT NULL, vendorName CHAR(64) ) ; 0 rows inserted/updated/deleted ij> exit ;

The easiest way to interact with a Derby database is to use the ij tool, which was described in the first article in this series, "Developing with Apache Derby -- Hitting the Trifecta: Introduction to Apache Derby" (developerWorks, February 2006). If you follow along with the actions shown in Listing 2, you first create a new database named test. You may get a warning message when you issue the connect statement if the test database already exists. This warning can safely be ignored. Next, you implicitly create a new schema named bigdog and explicitly create two new tables -- products and vendors -- that are stored inside the bigdog schema. The creation of the schema is implicit because you don't issue a CREATE SCHEMA statement.

The products table has four columns: itemNumber, price, stockDate, and description. The itemNumber column provides a unique identity for each item (or row) and has an attached column-level constraint that enforces a valid value to always be supplied (NOT NULL). Without this requirement, the itemNumber column isn't guaranteed to be unique because multiple columns could be assigned a NULL value. The price column is created as a DECIMAL data type with a precision of 5 and a scale of 2. This means the maximum price any item can have is $999.99. The last two columns are simple: The stockDate column is stored as a Date, and description is stored as a string of characters with a maximum length of 128 that's truncated to whatever length is provided.

The vendors table has three columns: itemNumber, vendorNumber, and vendorName. In this case, both the itemNumber and vendorNumber columns have attached column-level constraints (NOT NULL). In addition, the vendorName column is stored as a string of characters with a maximum length of 64. Because the vendorName column is stored using the CHAR data type, 64 character spaces are always reserved.

After creating a variety of items, you may wonder if there's an easy way to see what items are stored in a database. Fortunately, the answer is yes using the dblook tool. Running this tool, as shown in Listing 3, provides a detailed listing of the items that have been created in a particular database.

Listing 3. Viewing your schema with dblook

Click to see code listing

Listing 3. Viewing your schema with dblook

rb$ java org.apache.derby.tools.dblook -d jdbc:derby:test -- Timestamp: 2006-03-04 10:52:34.056 -- Source database is: test -- Connection URL is: jdbc:derby:test -- appendLogs: false -- ---------------------------------------------- -- DDL Statements for schemas -- ---------------------------------------------- CREATE SCHEMA "BIGDOG"; -- ---------------------------------------------- -- DDL Statements for tables -- ---------------------------------------------- CREATE TABLE "BIGDOG"."PRODUCTS" ( "ITEMNUMBER" INTEGER NOT NULL, "PRICE" DECIMAL(5,2), "STOCKDATE" DATE, "DESCRIPTION" VARCHAR(128)); CREATE TABLE "BIGDOG"."VENDORS" ( "ITEMNUMBER" INTEGER NOT NULL, "VENDORNUMBER" INTEGER NOT NULL, "VENDORNAME" CHAR(64));

The dblook tool is another Java class that enables you to easily dump the contents of a database to the console. You run it from a command prompt just as you would any Java program; the only addition is the use of the -d jdbc:derby:test argument, which specifies which database the dblook tool should query. If you can run the ij tool, the dblook class files are already in your CLASSPATH. If not, refer back to the first article in this series for more detailed instructions on properly setting up your CLASSPATH. As the output of the dblook tool shows, the test database contains the bigdog schema, which itself contains the products and vendors tables. In addition, the various columns for these two tables are completely detailed.


Delete a table in Derby

No one is perfect. What do you do when you incorrectly create a table or a table is no longer needed? The simple answer is to delete the table from the database and, if necessary, create a replacement table. Deleting a table is easy, which means, of course, that you should exercise great care when doing so -- no dialog box pops up and asks if you're sure you want to proceed!

The full syntax for deleting -- or, more formally, dropping -- a table from a database is:

DROP TABLE [schemaName.]tableName ;

The syntax is simple: You append the fully qualified name and a semicolon to the DROP TABLE SQL command, and you're finished. The process of dropping a table is demonstrated on a newly created temporary table in Listing 4.

Listing 4. Dropping a table from a Derby database

Click to see code listing

Listing 4. Dropping a table from a Derby database

rb$ java org.apache.derby.tools.ij ij version 10.1 ij> connect 'jdbc:derby:test' ; ij> CREATE TABLE temp ( aColumn INT ) ; 0 rows inserted/updated/deleted ij> DROP TABLE temp ; 0 rows inserted/updated/deleted ij> exit ;

Summary

You're well on your way to working with databases in Apache Derby. Now you have a grasp of basic database concepts, including schemas, tables, and columns, and you've seen demonstrations of these concepts using a fictitious business named Bigdog's Surf Shop. To work with a database like Derby, you know you need to learn SQL, the standard language for interacting with databases. This article also covered the basic data types you can use for storing data in a Derby database. Putting these concepts together, you learned how to create and drop tables by using Derby, and you used the Derby dblook tool to dump the schema contents of a database.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Information Management
ArticleID=108004
ArticleTitle=Developing with Apache Derby -- Hitting the Trifecta: Database development with Apache Derby, Part 2
publish-date=04182006