Information icon IBM Information Server, Version 8.1
Feedback

Text File Basic Format

The text file comprises a set of one or more rules, each on a separate line. Each rule contains a string of ordered characters that starts with an anchor point This is an absolute point that determines the order of other characters. It has the format &character. For example &a means the character "a" is the anchor point, all other rules on that line are relative to that letter. The following table gives the other symbols you can use:

Symbol Example Description
< a < b Identifies a primary (base letter) difference between "a" and "b"
<< a<<ä Signifies a secondary (accent) difference between "a" and "ä"
<<< a<<<A Identifies a tertiary difference between "a" and "A"
= x =y Signifies no difference between "x" and "y"

For example, the rule &a < g has the following sorting consequences:

Without Rule With Rule
apple apple
Abernathy Abernathy
bird green
Boston bird
green Boston
Graham Graham

Add the rule &A<<<G and the sorting would be as follows:

With Additional Rule
apple
Abernathy
green
Graham
bird
Boston

There are also options that you can specify in the file, and more advanced syntactical elements that you can use. These are described in full at:

http://oss.software.ibm.com/icu/userguide/Collate_Customization.html

For details of the UCA rules see:

http://www.unicode.org/unicode/reports/tr10/


PDF This topic is also in the IBM WebSphere DataStage and QualityStage National Language Support Guide.

Update icon Last updated: 2008-09-30