Sorting of character data in OpenPages is different on Oracle and Db2 based systems

General Page

The OpenPages application relies on the database to perform sorting of data. The underlying character-based sort mechanics of Oracle and Db2 are different and the same data in Oracle and Db2 might be presented in a different order. The difference in these sort mechanics is fundamental to the underlying database and not a product defect.

When it comes to comparing sort mechanics between Oracle and Db2, there are two important aspects to consider.

The first aspect is around support for altering sort mechanics for individual users. For a deployment that uses the Oracle database, we perform locale-specific sorting by leveraging an Oracle feature that allows us to set that behavior within a database session. The application dynamically alters the database session to match the locale of the user. With DB2, there is no equivalent session level capability and altering sort mechanics for individual users is not possible. The sort mechanics for DB2 are determined by the database collation sequence and that collation is equally applied across all sessions. The collation sequence is set at database creation time and cannot be changed.

The second aspect to consider is what are the available Db2 collation sequences and which of those sequences are supported by the OpenPages product. For a Unicode database, there are three different collation sequences:

IDENTITY collation sequence
- The fastest collating sequence
- Not language aware
- Sorting is accomplished by a binary comparison of the encoded values
- Details on the IDENTITY collation can be found in the following documentation link:
  - IDENTITY Collation

Language aware collation sequence
- Fast collating sequence
- Language aware but not locale sensitive
- Sorting is accomplished by using a weight table of 256 code points
- Code points not in the weight table are sorted in binary order
- Details on the language aware collations can be found in the following documentation link:
  - Language Aware Collation

Locale-sensitive UCA-based collation sequence
- Full support of the Unicode Collation Algorithm
- Handles all Unicode code points
- Substantial performance penalty
- Details on the Locale-sensitive UCA-based collations can be found in the following documentation link:
  - Locale Sensitive Collation

The OpenPages product support for these collation sequences is as follows:

IDENTITY collation sequence = Supported
Language aware collation sequence = Supported
Locale-sensitive UCA-based collation sequence = Not supported

As mentioned in the documentation, the Locale-sensitive collation comes with a substantial performance penalty. The OpenPages team tested the Locale-sensitive collation for common product use cases and determined that the performance penalty associated to this collation was not supportable.

To help you visualize the sorting of character data for the supported collation sequences, here is a simple example of data in Db2 sorted with the IDENTITY collation.

ACE
ACES
ACe
Ace
BCE
BCe
Bce
aCE
aCe
ace
aces
bCe

And here is the same data in Db2 sorted with a language aware collation (SYSTEM_819):

ace
aces
aCe
aCE
Ace
ACe
ACE
ACES
bCe
Bce
BCe
BCE

For comparison, here is the same data from an Oracle system sorted with the OpenPages locale-specific collation for US English (BINARY_AI):

ace
ACe
ACE
Ace
aCE
aCe
ACES
aces
BCe
BCE
Bce
bCe

Now that we know which collation sequences are supported and reviewed an example of the sort mechanics, we need to determine how to influence the selection of the appropriate collation sequence for your deployment. As discussed in this documentation link, Db2 uses a combination of factors to determine which code set and collation sequence might be used:

Code set collation combinations

You can see in that document that there are three factors used to determine the code set and collating sequence for the database. Those factors are:

The collating sequence specified when the database is created

The code set specified when the database is created

The locale of the client when the database is created

For the first factor, we do not specify a specific collation in our database create script and rely on the Db2 default (SYSTEM).

For the second second factor, we do not explicitly specify a code set, which means that the code set is Unspecified.

These first two factors are the same for all OpenPages installs and if we review the chart in the documentation we see that we are left with the following two options:

Notice that in both of these cases, the database is created with a Unicode code set that can be confirmed by checking the database configuration for your specific deployment:

Database code set                                       = UTF-8

However, the collation that is used depends upon whether the client locale is Unicode or non-Unicode.

Here is an example of how the client locale impacts the collation sequence when the database is created. First, we specify the encoding in the LANG session variable to be Unicode:

$ echo $LANG
en_US.UTF-8
$

Creating the database with this setting produces a database with the IDENTITY collation:

$ db2 get db cfg for OPX | egrep 'Database code|Database coll'
 Database code page                                      = 1208
 Database code set                                       = UTF-8
 Database collating sequence                             = IDENTITY
$

However, if the LANG session variable does not specify Unicode:

$ echo $LANG
en_US
$

The database is created with a language aware collation (SYSTEM_819):

$ db2 get db cfg for OPX | egrep 'Database code|Database coll'
 Database code page                                      = 1208
 Database code set                                       = UTF-8
 Database collating sequence                             = SYSTEM_819
$

As mentioned earlier, the collation sequence is set at database creation time and cannot be changed. It is not possible to alter an existing database to use a different collation sequence. We strongly recommend you check this value immediately after creation of the database to confirm you have the behavior you want.

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSFUEU","label":"IBM OpenPages with Watson"},"ARM Category":[{"code":"a8m50000000ClDPAA0","label":"Database"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

Sorting of character data in OpenPages is different on Oracle and Db2 based systems

General Page

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?