Skip to main content

Making DB2 Case-Insensitive

Blair AdamacheIBM Toronto Lab

Blair has been working in relational database development and service at IBM Toronto Lab since 1987. Thanks go to the following developers for their suggestions on this and other articles: Serge Rielau, Paul Huffman, Paul Bird, Rick Swagerman and Robert Begg.

Summary:  Read these tips on how you can easily allow your users to search DB2 for Linux, UNIX, and Windows data in a way that is natural for them-with no worries about case sensitivity.

Date:  08 May 2008 (Published 01 Mar 2002)
Level:  Introductory
Activity:  4289 views

Introduction

New capabilities with DB2 9.5 fixpack 1

Locale-aware Unicode collation algorithm-based collations were introduced in DB2 9.5 fixpack 1. These collations can be tailored to ignore case and/or accents. When a database is created with a collation that ignores case, all database operations become case insensitive. It is necessary to create a new Unicode database to use these collations.

To create a database with an English collation that is case insensitive, use the following statement:

CREATE DATABASE testdb 
   USING CODESET UTF-8 TERRITORY US 
   COLLATE USING UCA500R1_LEN_S2

Selecting from tables in this database ignores case.

SELECT lastname FROM employee 
   ORDER BY lastname

LASTNAME
-----------------
abrams
Adams
allison
Alomar

  4 record(s) selected.

Case insensitivity applies to all operations in the database.

SELECT lastname FROM employee 
   WHERE lastname LIKE 'AL%' 
   ORDER BY lastname

LASTNAME
-----------------
allison
Alomar

  2 record(s) selected.

I answer a lot of questions about DB2®, and one of the common ones I see is about how to make DB2 case-insensitive. The short answer is that DB2 for Linux®, UNIX®, and Windows® Version 7.2 is case sensitive: BLAIR is not the same as Blair. Case sensitivity is powerful (why else would Netscape give you the choice of 'Match case' when you do a search?), and a relational database can search more quickly if it knows that Greenland is not a match for greenland. However, in all probability, your users probably want your application to return 'MacInnis' if they submit a search on 'Macinnis'. Case insensitivity is important if you can't control the data coming into the database; basically, in any circumstance in which users enter data for themselves, such as their names and addresses, or when you are loading outside data. Allowing creation of a case-insensitive database in DB2 is an important requirement for future releases.

All samples in this article were tested with DB2 Version 7, FixPak 6 on Windows 2000. They will have the same behavior on UNIX as well.

Using functions to provide case insensitivity

However, there are things you can do to provide case-insensitive support at the database design level. All members of the DB2 family supply the following functions, which can all be used to fold lowercase data in CHAR and VARCHAR fields to uppercase:

  • TRANSLATE, which can also be used for substitution.
  • UCASE.
  • UPPER (an alternative syntax for UCASE).

See the SQL Reference manual for the syntax and options for these functions.

LCASE and LOWER fold uppercase data to lowercase.

Here are some examples of these functions in use:

     SELECT UPPER(lastname) FROM employee

1              
---------------
HAAS
THOMPSON
KWAN

     SELECT LCASE(lastname) FROM employee

1
---------------
haas
thompson
kwan

So, although your data may be stored in mixed case, you can present it in consistent uppercase or lowercase by using views. For example:

CREATE VIEW upperemp
         (empno, firstnme, midinit, lastname, workdept, phoneno,
             hiredeate, edlevel, birthdate, salary, bonus, comm)
AS SELECT UPPER(empno), UPPER(firstnme), UPPER(midinit), UPPER(lastname),
UPPER(workdept), UPPER(phoneno), hiredate, edlevel, birthdate,
      salary, bonus, comm
 FROM employee

If you select against the view, you can guarantee that results will be evaluated and returned in uppercase. This is important for sorting. Respecting mixed case means that lowercase a is sequenced before uppercase A, resulting in allison coming before Adams, not the way these would be ordered in an address book:

     SELECT lastname FROM employee ORDER BY lastname

LASTNAME
-----------------
abrams
allison
Adams
Alomar

  4 record(s) selected.

Selecting against the view provides the data in the desired order:

     SELECT lastname FROM upperemp ORDER BY lastname

COL
----------
ABRAMS
ADAMS
ALLISON
ALOMAR


What about INSERT, UPDATE and DELETE?

To keep things simple, use INSERT, UPDATE and DELETE on the base table, not the view. This is mainly because the SQL Reference specifies the conditions upon which views can be deletable, updatable or insertable:

  • INSERTs are not allowed.
  • You can update columns that are not based on the UPPER function.
  • You can delete through the view.

With these rules, operations like these can be performed against your view:

DELETE FROM upperemp WHERE lastname='GEYER'


UPDATE upperemp SET comm=50 WHERE lastname='KWAN'

The next release of DB2 on the Linux, Windows®, and UNIX® platforms is planned to contain two features that loosen these restrictions:

  • A view will be insertable if it contains any column that is updatable. Columns based on expressions are omitted in the column list and the base values are set to defaults.
  • Second, the release is planned to have INSTEAD OF triggers, like this for INSERT, which directs the update to go to the base table, thereby allowing the use of INSERTs on the view:
CREATE TRIGGER upper_emp_ins INSTEAD OF INSERT ON upperemp
REFERENCING NEW AS new DEFAULTS NULL FOR EACH ROW MODE DB2SQL
INSERT INTO employee
VALUES(new.empno, new.firstnme, new.midinit,
       new.lastname, new.workdept, new.phoneno,
       new.hiredate, new.edlevel, new.birthdate,
       new.salary, new.bonus, new.comm);

UPDATE could be handled through this sort of trigger to allow updates of the UPPER() columns through the view:

CREATE TRIGGER upper_emp_ins INSTEAD OF UPDATE ON upperemp
REFERENCING NEW AS new DEFAULTS NULL FOR EACH ROW MODE DB2SQL
UPDATE employee
SET (empno, firstnme, midinit, lastname, workdept, phoneno,
     hiredate, edlevel, birthdate, salary, bonus, comm)
= (new.empno, new.firstnme, new.midinit,
   new.lastname, new.workdept, new.phoneno,
   new.hiredate, new.edlevel, new.birthdate,
   new.salary, new.bonus, new.comm)
      WHERE old.empno = empno;


Selecting and ordering rows

All selects, including subselects within INSERT, UPDATE and DELETE statements, should reference the view to achieve consistent and expected behavior when ordering or searching on character data. For applications with embedded SQL, you can use a function, like UPPER, to fold the search arguments in WHERE clauses:

SELECT lastname FROM upperemp WHERE lastname = UPPER('Geyer')

You can also use the UPPER function to fold host variables to uppercase:

SELECT lastname FROM upperemp WHERE lastname = UPPER(:lstname)


Using indexes with data folded to uppercase

By using case functions and views, you can store, insert, update, and insert data in mixed case, but retrieve, order and search only in uppercase. There's one more thing to consider: when you use a case function, this normally forces a table scan. Generally, table scans are more expensive than using an index. One of DB2 Version 7's new features, generated columns, can be used to store data in both mixed and uppercase, and still allow the benefit of an index when searching on the data in uppercase. You wouldn't usually store all columns in both mixed and uppercase: just the ones that are frequently used in search predicates (most likely the same columns you want to create indexes on).

Here's how you do it:

  1. If the standard access method is a search on name, use a generated column to store the name in uppercase:

    CREATE TABLE employee (name VARCHAR(10),
          Name_up GENERATED ALWAYS AS (UPPER(name)))
    

  2. Now create an index over the uppercase version of the column:

    CREATE INDEX name_ind ON employee (name_up)
    

    Now, the following query can get the benefit of the index, and avoid a table scan:

    SELECT name FROM employee WHERE UPPER(name) = 'MACINNIS'
    

One small caveat: If you create the generated column using UPPER, you must code the SELECT with the UPPER function to get the benefit of an index scan. Similarly, if you go with UCASE in the generated column, the SELECT has to use UCASE as well. It makes good sense to choose which version of the functions you want to use (LOWER and UPPER vs. UCASE and LCASE) and be consistent in using them.


Case sensitivity and object names

All database object names (tables, views, columns and so on) are stored in the catalog tables in uppercase unless the identifier is delimited. If you use a delimited name to create the identifier, the exact case of the name is stored in the catalog tables.

An identifier, such as a column name or table name, is treated as case insensitive when used in an SQL statement unless it is delimited. For example, assume that the following statements are issued:

CREATE TABLE MyTable   (id INTEGER)
CREATE TABLE "YourTable" (id INTEGER)

Two tables -- MYTABLE and YourTable -- will exist.

Now, the following two statements are equivalent:

SELECT * FROM MyTable
SELECT * FROM MYTABLE

However, the second statement below will fail with TABLE NOT FOUND since there is no table named YOURTABLE:

SELECT * FROM "YourTable"    // executes without error
SELECT * FROM YourTable      // error, table not found


Conclusion

The approaches I describe in this article give you the ability to store data as you always have, including the ability to load mixed case data into a table without cleansing it first. These approaches also give DB2 the benefit of searching on data without having to equate upper and lowercase characters; any relational database that does case insensitive search has an algorithm somewhere folding all character data to one case for comparison and sorting. While this is convenient, it is not a free operation, and you're paying for it in processing time within the database software. The approach described in this article gives you the flexibility to decide which columns should be case-insensitive.


About the author

Blair has been working in relational database development and service at IBM Toronto Lab since 1987. Thanks go to the following developers for their suggestions on this and other articles: Serge Rielau, Paul Huffman, Paul Bird, Rick Swagerman and Robert Begg.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=14168
ArticleTitle=Making DB2 Case-Insensitive
publish-date=05082008
author1-email=
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers