The basics of relational database systems
Before you can begin to develop database applications, you need to understand the basic concepts. This section presents the data types available in Apache Derby and the rules that affect your ability to design and create useful Derby database applications.
Relational databases hold data. This data can be of different types, such as numbers, characters, or dates. Within the database, the data are organized into logical units called tables. A table is like a spreadsheet, because it contains rows of data. Each row is made up of a number of columns. The columns hold data of a specific data type, like integer values or strings of characters. In most cases, a database has more than one table. To relate the tables together, a database designer takes advantage of natural (or artificial) links between the tables. In a spreadsheet, you can link rows in different sheets by cell values. The same idea holds in a relational database, and the column used to make the link is referred to as a key column.
To make it easier to understand the purpose of a table or a particular column, you should select appropriate names. The naming conventions can vary between databases. For the Apache Derby database system, individual names:
- Are case insensitive.
- Are limited to 128 characters in length.
- Must begin with a letter.
- Must only contain Unicode letters, underscore characters, and Unicode digits.
You can escape these rules by placing the name in double quotation marks, which allows names to be case sensitive and to include additional characters (including spaces). Doing this, however, is generally a bad practice: It requires the name to always be enclosed in double quotation marks and can easily confuse someone else who may be maintaining your code.
Related tables are often grouped together into a schema. You can
think of a schema as a container for all the related structure definitions
within a particular database. A table name must be unique within a given
schema. Thus, by using schemas, you can have identically named objects
(such as tables) enclosed within different schemas. When you're using the
Apache Derby database, a table is always in a schema. If you don't specify
a schema explicitly, Derby implicitly uses the built-in
apps schema. A second built-in schema called
sys is used to isolate system tables.
You can use the schema name to qualify a name. To do so, you specify
the schema name followed by a period and then the table name. For example,
bigdog.products references the
products table in the
bigdog schema. Without the relevant schema
name, a table name is said to be unqualified, as in
products. When the schema name and the table
name are completely specified, as in
bigdog.products, the name is said to be
In an abstract sense, these database concepts may seem confusing, but in practice they're fairly straightforward. For example, imagine you own a store called Bigdog's Surf Shop that sells a variety of items like sunglasses, shirts, and so on. If you want to be profitable, you must keep a close eye on your inventory so you can easily order additional inventory or change vendors to keep your overhead to a minimum. One simple method for tracking this information is to write entries in a table-like format, as shown in Figure 1.
Figure 1. A sample schema for Bigdog's Surf Shop
From this simple visual design you can easily map the business logic straight into database tables. You have two database tables, Products and Vendors, which are naturally linked by the item number. The data types for the columns in each table are easy to determine. The rest of this article focuses on creating the sample schema for Bigdog's Surf Shop -- which consists of these two tables -- in a Derby database.
Work with a relational database: Structured Query Language
Database systems can be complex pieces of software, especially when they scale to support enterprise-level applications. As a result, you may expect that every database has its own application programming interface (API) and that these APIs may be different from one system to the next. When relational databases were first developed, this was the case; but, fortunately, a number of vendors agreed to develop a standard language for accessing and manipulating relational databases. This language is officially called Structured Query Language (or SQL, pronounced sea-quill). Several official standard versions have been produced, including one in 1992 that is referred to as SQL-92, and one in 1999 that is referred to as SQL-99. The Apache Derby database provides a nearly complete implementation of the SQL-92 standard, so applications developed with Derby can be easily transported to other database systems.
SQL has two main components: a Data Definition Language (DDL) and a Data Manipulation Language (DML). DDL commands are used to create, modify, or delete items (such as tables) in a database. DML commands are used to add, modify, delete, or select data from a table in the database. The rest of this article provides a basic introduction to the DDL components of SQL. Future articles will focus on the DML commands and more advanced DDL commands.
SQL data types
SQL, being a programming language in its own right, defines a rich data-type hierarchy. Persisting these data types is one of the most important responsibilities of the database. As databases have become more powerful, this type hierarchy has grown more complex. But most simple databases don't require the full range of allowed types, and often they need to store only numerical, character, and date or time data. In the interest of simplicity, Tables 1, 2, 3, and 4 present the basic SQL data types as implemented in Derby.
As shown in Table 1, Derby provides support for three different integer data types. These types are distinguished by the range of integers they can store and, therefore, by the amount of storage space they require in the database. One of the key issues to keep in mind when designing a database is to always try to minimize the amount of storage your table consumes. In general, smaller tables provide higher performance, but you must be able to store your data in the resulting table. For the record, 231 is equal to 2,147,483,648, and 263 is equal to 9,223,372,036,854,775,808, so using these data types lets you store very big integers!
|Data type||Minimum value||Maximum value||Example||Description|
|-32768 (-215)||32767 (215 - 1)||2-byte integer representation|
|-231||231 - 1||4-byte integer representation|
|-263||263 - 1||8-byte integer representation|
Most numerical data can't be represented as integers. Derby provides support for real numbers in several formats: single-precision floating-point, double-precision floating-point, and an exact representation decimal, as presented in Table 2.
|Data type||Minimum value||Maximum value||Example||Description|
|-3.402x10+38||3.402x10+38||IEEE floating-point number (4 bytes)|
|-1.79769x10+308||1.79769x10+308||IEEE floating-point number (8 bytes)|
|31 (maximum precision)||Exact decimal representation|
If you've never encountered an exact precision data type, the distinction between a decimal and floating-point type may be confusing. The difference lies in the fact that floating-point data types used in computers can't hold every real number. This may seem odd, but recall that there are an infinite number of distinct real values. Most real numbers can't be stored in just a few bytes of memory. For some applications, this loss of precision is acceptable. In many cases, however, it isn't. For example, a financial application can't afford to lose money just because a particular number can't be stored in the computer.
The solution to this problem is to use the
DECIMAL data type, which lets you control the
total number of digits the computer stores (the precision) and the
number of digits that follow the decimal point (the scale). To
create a decimal type, you specify the precision and, optionally, the
scale for the data to be stored. The storage required for a
DECIMAL data type is potentially much larger
than it is for a normal floating-point data type. Thus you should use this
type carefully or the performance of your application may be reduced. By
default, the scale for a
DECIMAL type is 0,
which means the
DECIMAL data type mimics an
There are several synonyms for the numerical types. For instance, the
DECIMAL data type can be shortened to
DEC or referred to as
type can also be referred to as
DOUBLE PRECISION, although it isn't clear why
you would want to type the extra word every time you need a
double-precision number. A more general synonym is the
FLOAT type, which has an arbitrary
floating-point precision that you specify when declaring the data type as
FLOAT(val). The precision must be a positive
number less than 53; if it isn't, you get an error. If you specify the
precision to be a value of 23 or less, the
FLOAT(val) is equivalent to a
REAL; if the precision is between 24 and 53,
FLOAT(val) is equivalent to a
In addition to numerical data, the other most popular type of data to store
in a database is character data. Examples of character data include an
item description, a person's name, or address information. Derby provides
two simple techniques for storing character data: the
CHAR type and the
VARCHAR type, which are detailed in Table 3.
For both types, you can specify a
parameter, which defaults to 1 if no value is specified. There are two
main differences between these two character data types. First, the
CHAR type has a maximum length of 254
characters, whereas the
VARCHAR type can hold
up to 32,672 characters. The second difference is more subtle: The
CHAR type is always the length you specify. If
not enough characters are specified, extra spaces are inserted to fill the
remaining spots. With a
VARCHAR, the number of
characters can vary, and no extra padding is performed.
|Data type||Maximum length||Example||Description|
|254||Fixed-length character string|
|32,672||Variable-length character string|
Due to the varying widths, the
VARCHAR type can
be more efficient in terms of actual storage space but can also be less
efficient when it comes to performance. The
CHAR data type provides a performance boost
because the database knows exactly how big every
CHAR column will be and can make certain
performance optimizations when reading or writing the data. The maximum
length for a
VARCHAR column may seem
sufficiently large, but Derby provides larger character data types, which
will be discussed in a future article.
The last simple class of data types Derby provides stores dates and times,
as detailed in Table 4. The
TIME data type
stores hours, minutes, and seconds in 24-hour format
data type stores the month, day, and year, which can be specified in a
number of different formats, including the following:
|Data type||Minimum value||Maximum value||Example||Description|
|00:00:00||24:00:00||Time representation (second accuracy)|
|0001-01-01||9999-12-31||Date representation (day accuracy)|
Derby also provides a
TIMESTAMP data type that
DATE data types into a single type for marking
exact moments in time.
Create a table in Derby
So far, you've learned how to design a table, including mapping out the table columns and defining the data type for each column. After you've properly designed a table, the method for creating a table in SQL is straightforward. Listing 1 shows the formal syntax for creating a table in Derby.
Listing 1. CREATE TABLE syntax for Apache Derby
You may feel bewildered after looking at this syntax for the first time.
But it's easy to follow once you have the basics down; and understanding
the formal syntax is a must if you want to master Derby. The square
enclose optional parameters. As you can see from the formal syntax, the
schema name is optional, as are any column definitions or table-level
constraints after the required initial one (it wouldn't make sense to
create a table with no columns!).
You probably understand what is meant by a column definition, but you
might not understand the idea of a constraint. Constraints come in
two types: table-level constraints and column constraints. A constraint
limits either a column or a table in some manner. For example, you can use
a constraint to require that a column always be assigned an actual value
NULL values), or that every entry in a
column must be unique, or that a column is automatically assigned a
default value. Constraints will be addressed in more detail in a future
The asterisk (
*) after the last closing square
bracket indicates that more than one of the enclosing items can be
included. This implies that the table must have one or more columns or
table-level constraints. The vertical line (
indicates an either/or condition. In this syntax example, you must either
define a new column or define a new table-level constraint. The curly
group related items together, and the parentheses
required elements. Finally, the semicolon (
indicates the end of a SQL statement.
Listing 2. Creating a table in Apache Derby
The easiest way to interact with a Derby database is to use the
ij tool, which was described in the first
article in this series, "Developing with Apache Derby -- Hitting the Trifecta: Introduction to
Apache Derby" (developerWorks, February 2006). If you follow along
with the actions shown in Listing 2, you first create
a new database named
test. You may get a
warning message when you issue the
statement if the
test database already exists.
This warning can safely be ignored. Next, you implicitly create a new
bigdog and explicitly create two
new tables --
vendors -- that are stored inside the
bigdog schema. The creation of the schema is
implicit because you don't issue a
CREATE SCHEMA statement.
products table has four columns:
itemNumber column provides a unique identity
for each item (or row) and has an attached column-level constraint that
enforces a valid value to always be supplied
NOT NULL). Without this requirement, the
itemNumber column isn't guaranteed to be unique
because multiple columns could be assigned a
NULL value. The
price column is created as a
DECIMAL data type with a precision of 5 and a
scale of 2. This means the maximum price any item can have is $999.99. The
last two columns are simple: The
column is stored as a
description is stored as a string of characters
with a maximum length of 128 that's truncated to whatever length is
vendors table has three columns:
vendorName. In this case, both the
vendorNumber columns have attached column-level
NOT NULL). In addition, the
vendorName column is stored as a string of
characters with a maximum length of 64. Because the
vendorName column is stored using the
CHAR data type, 64 character spaces are always
After creating a variety of items, you may wonder if there's an easy way to
see what items are stored in a database. Fortunately, the answer is yes
dblook tool. Running this tool, as
shown in Listing 3, provides a detailed listing of
the items that have been created in a particular database.
Listing 3. Viewing your schema with dblook
dblook tool is another Java class that
enables you to easily dump the contents of a database to the console. You
run it from a command prompt just as you would any Java program; the only
addition is the use of the
argument, which specifies which database the
dblook tool should query. If you can run the
ij tool, the
class files are already in your
not, refer back to the first article in this series for more detailed instructions on
properly setting up your
CLASSPATH. As the
output of the
dblook tool shows, the
test database contains the
bigdog schema, which itself contains the
tables. In addition, the various columns for these two tables are
Delete a table in Derby
No one is perfect. What do you do when you incorrectly create a table or a table is no longer needed? The simple answer is to delete the table from the database and, if necessary, create a replacement table. Deleting a table is easy, which means, of course, that you should exercise great care when doing so -- no dialog box pops up and asks if you're sure you want to proceed!
The full syntax for deleting -- or, more formally, dropping -- a table from a database is:
DROP TABLE [schemaName.]tableName ;
The syntax is simple: You append the fully qualified name and a semicolon
DROP TABLE SQL command, and you're
finished. The process of dropping a table is demonstrated on a newly
created temporary table in Listing 4.
Listing 4. Dropping a table from a Derby database
You're well on your way to working with databases in Apache Derby. Now you
have a grasp of basic database concepts, including schemas, tables, and
columns, and you've seen demonstrations of these concepts using a
fictitious business named Bigdog's Surf Shop. To work with a database like
Derby, you know you need to learn SQL, the standard language for
interacting with databases. This article also covered the basic data types
you can use for storing data in a Derby database. Putting these concepts
together, you learned how to create and drop tables by using Derby, and
you used the Derby
dblook tool to dump the
schema contents of a database.
- Read Developing with Apache Derby -- Hitting the Trifecta: Introduction to Apache Derby (developerWorks, February 2006), the first article in this series, for a gentle introduction to the Apache Derby database.
- Check out the manuals about how to use the Apache Derby database on the Apache Derby project site.
- Learn how to download and install Apache Derby.
- Learn how to properly verify your download.
- Check out the developerWorks Apache Derby project area for articles, tutorials, and other resources to help you get started with Derby today.
- Browse for books on these and other technical topics at the Safari bookstore.
- Browse all the Apache articles and free Apache tutorials available in the developerWorks Open source zone.
Get products and technologies
- Download Apache Derby from the Apache Derby project home page.
- Get involved in the developerWorks community by participating in developerWorks blogs.
Dig deeper into Open source on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.