|
- CUN4BOPR_Version - set by caller
Specifies the version of the parameter area. This field must
be initialized for the first call to stub routine CUN4LCOL using the
constant CUN4BOPR_Ver which is supplied by the interface definition
file CUN4BOID.
In order to exploit new Collation features
(UCA versions UCA400R1, UCA410, UCA600 and tailoring
features), CUN4BOPR_Version must be set with CUN4BOPR_Ver2 (Collation parameter area version 2). For
backward compatibility purposes, the default value is CUN4BOPR_Ver.
- CUN4BOPR_Length - set by caller
- Specifies the length of the parameter area. HLASM users must initialize
this field for the first call to CUN4LCOL using the constant CUN4BOPR_Len
which is supplied by the interface definition file CUN4BOID.
- CUN4BOPR_Src1_Buf_Ptr - set by caller, updated by service
- Specifies the beginning address of the string of Unicode characters
to be processed. No write operations are done in this field. The string
has the length specified in the CUN4BOPR_Src1_Buf_Len parameter.
Note: Source
buffer pointed by CUN4BOPR_Src1_Buf_Ptr must contain UTF-16 BE characters
format only. Otherwise, Collation Service will cause
unpredictable results.
- CUN4BOPR_Src1_Buf_ALET - set by caller
- Specifies the ALET to be used if the source 1 buffer addressed
by CUN4BOPR_Src1_Buf_Ptr resides in a different data space. If not
the primary address, the default value is 0.
- CUN4BOPR_Src1_Buf_Len - set by caller
- Specifies the length in bytes of the string in the source buffer,
addressed by CUN4BOPR_Src1_Buf_Ptr, to be collated.
- CUN4BOPR_Src2_Buf_Ptr - set by caller, updated by service
- Specifies the beginning address of the string of Unicode characters
to be processed. No write operations are done in this field. The string
has the length specified in the CUN4BOPR_Src2_Buf_Len parameter.
Note: Source
buffer pointed to by CUN4BOPR_Src2_Buf_Ptr must contain UTF-16 BE
character format only. Otherwise, Collation Service
will cause unpredictable results.
- CUN4BOPR_Src2_Buf_ALET - set by caller
- Specifies the ALET to be used if the source 2 buffer addressed
by CUN4BOPR_Src2_Buf_Ptr resides in a different data space. If not
the primary address, the default value is 0.
- CUN4BOPR_Src2_Buf_Len - set by caller
- Specifies the length in bytes of the string in the source buffer,
addressed by CUN4BOPR_Src2_Buf_Ptr, to be collated.
- CUN4BOPR_Targ1_Buf_Ptr - set by caller, updated by service
- This variable has two primary functions:
- Binary comparison - If you need to do a comparison, you must specify
two strings (to do a logical comparison). For this reason, CUN4BOPR_Targ1_Buf_Ptr
needs to specify the beginning address and its related fields (CUN4BOPR_Targ1_Buf_ALET
and CUN4BOPR_Targ1_Buf_Len).
- Sort key vector generation - If you need to generate a sort key
vector, and you choose to set the CUN4BOPR_Src1_Buf_Ptr, you
also need to set up its relative values (CUN4BOPR_Src1_Buf_ALET and
CUN4BOPR_Src1_Buf_Len).
In both cases, it is important that
you to set up this field correctly. For more information, see Target buffer length considerations and Sort key vector format.
- CUN4BOPR_Targ1_Buf_ALET - set by caller
- Specifies the ALET to be used if the target 1 buffer addressed
by CUN4BOPR_Targ1_Buf_Ptr resides in a different data space. If
not the primary address, the default value is 0.
- CUN4BOPR_Targ1_Buf_Len - set by caller, updated by service
- Specifies the length in bytes of the target buffer addressed by
CUN4BOPR_Targ1_Buf_Ptr. Certain conditions apply, dependent upon
the collation level and the need for a sort key vector. See Target buffer length considerations for more information.
- CUN4BOPR_Targ2_Buf_Ptr - set by caller, updated by service
- This variable has two primary functions:
- Binary comparison - If you need to do a comparison, you must specify
two strings (to do a logical comparison). For this reason, CUN4BOPR_Targ2_Buf_Ptr
needs to specify the beginning address and its related fields (CUN4BOPR_Targ2_Buf_ALET
and CUN4BOPR_Targ2_Buf_Len).
- Sort key vector generation - If you need to generate a sort key
vector, and you choose to set the CUN4BOPR_Src2_Buf_Ptr, you
also need to set up its relative values (CUN4BOPR_Src2_Buf_ALET and
CUN4BOPR_Src2_Buf_Len).
In both cases, it is important that
you to set up this field correctly. For more information, see Target buffer length considerations and Sort key vector format.
- CUN4BOPR_Targ2_Buf_ALET - set by caller
- Specifies the ALET to be used if the target 2 buffer addressed
by CUN4BOPR_Targ2_Buf_Ptr resides in a different data space. If not
the primary address, the default value is 0.
- CUN4BOPR_Targ2_Buf_Len - set by caller, updated by service
- Specifies the length in bytes of the target buffer addressed by
CUN4BOPR_Targ2_Buf_Ptr. Certain conditions apply, dependent upon
the collation level and the need for a sort key vector. See Target buffer length considerations for more information.
- CUN4BOPR_Coll_Handle - set by caller, updated by service
- Specifies the handle to the collation tables. If the handle is
present, it will be used, otherwise a new handle will be returned
in CUN4BOPR_Coll_Handle. Subsequent calls to stub routine CUN4LCOL,
requesting the same collation properties, will be faster because then
the handle is used and CUN4BOPR_Coll_Type does not need to be recomputed.
Note: For
the first call to stub routine CUN4LCOL, CUN4BOPR_Coll_Handle must
be set to binary zero X'00'.
- CUN4BOPR_Coll_Level - set by caller
- Specifies the collation level as defined by the following constants
(defined in the interface definition file CUN4BOID):
- CUN4BOPR_PRIMARY
- CUN4BOPR_SECONDARY
- CUN4BOPR_TERTIARY
- CUN4BOPR_QUATERNARY
- CUN4BOPR_QUINARY (Supported by UCA400R1 and higher)
- CUN4BOPR_IDENTICAL (Supported by UCA400R1 and higher)
Note: - CUN4BOPR_QUINARY and CUN4BOPR_IDENTICAL have exactly the same
behavior and were added to cover multiple naming conventions for those Collation Levels.
- Collation Levels are also named as "Collation Strength". See CUN4BOPR_Collation_Keyword
field description.
- CUN4BOPR_Wrk1_Buf_Ptr - set by caller, updated by service
Specifies the beginning address of the string addressed by
CUN4BOPR_Wrk1_Buf_Ptr. This variable is mainly used for internal purposes;
however, it must always be set. See Work buffer length considerations for
more information.
- CUN4BOPR_Wrk1_Buf_ALET - set by caller, updated by service
- Specifies the ALET to be used if the work 1 buffer addressed by
CUN4BOPR_Wrk1_Buf_Ptr resides in a different data space. If not the
primary address, the default value is 0.
- CUN4BOPR_Wrk1_Buf_Len - set by caller, updated by service
- Specifies the length in bytes of the work 1 buffer addressed by
CUN4BOPR_Wrk1_Buf_Ptr. The length addressed will depend on the collation
rules, including the collation level. See Work buffer length considerations for
more information.
- CUN4BOPR_Wrk2_Buf_Ptr - set by caller, updated by service
Specifies the beginning address of the string addressed by
CUN4BOPR_Wrk2_Buf_Ptr. This variable is mainly used for internal purposes;
however, it must always be set. See Work buffer length considerations for
more information.
- CUN4BOPR_Wrk2_Buf_ALET - set by caller, updated by service
- Specifies the ALET to be used if the work 2 buffer addressed by
CUN4BOPR_Wrk2_Buf_Ptr resides in a different data space. If not the
primary address, the default value is 0.
- CUN4BOPR_Wrk2_Buf_Len - set by caller, updated by service
- Specifies the length in bytes of the work 2 buffer addressed by
CUN4BOPR_Wrk2_Buf_Ptr. The length addressed will depend on the collation
rules, including the collation level. See Work buffer length considerations for
more information.
- CUN4BOPR_DDA_Buf_Ptr - set by caller
- Specifies the beginning address of an area of storage that collation
needs internally as a dynamic data area.
Note: CUN4BOPR_DDA_Buf_Ptr
must be double-word boundary.
- CUN4BOPR_DDA_Buf_ALET - set by caller
- Specifies the ALET to be used if the dynamic data area addressed
by CUN4BOPR_DDA_Buf_Ptr resides in a different address or data
space. If not the primary address, the default value is 0.
- CUN4BOPR_DDA_Buf_Len - set by caller
- Specifies the length in bytes of the dynamic data area addressed
by CUN4BOPR_DDA_Buf_Ptr. The required length is defined by constant
CUN4BOPR_DDA_Req, which is provided in the interface definition file
(CUN4BOID).
- CUN4BOPR_Flag1 - set by caller
Bit position |
Name |
---|
1xxx xxxx
|
CUN4BOPR_Inv_Handle |
x1xx xxxx
|
CUN4BOPR_Get_New_Handle |
xx1x xxxx
|
CUN4BOPR_Page_Fix |
- CUN4BOPR_Inv_Handle
- Specifies the action to be taken when the collation handle is
invalid.
- 0: Indicates that the collation is to be terminated with
an error.
- 1: Indicates that the collation is to be done with a new
handle created by the collation service and put into CUN4BOPR_Coll_Handle.
- CUN4BOPR_Get_New_Handle
- Specifies the action to be taken with the new collation handle.
- 0: Get and use the new handle and continue with the service.
- 1: Get the new handle and return to the caller.
- CUN4BOPR_Page_Fix
- If the requested conversion is not currently loaded in memory,
this flag indicates if it should be loaded in page-fixed memory.
- 0: Indicates use of system storage management (default).
- 1: Indicates use of page fixing.
Note: CUN4BOPR_Page_Fix applies to callers that run from Key
0 to Key 7 only. Callers with other keys (8-F) cannot exploit PAGE
FIX storage in the Unicode Data Space.
- CUN4BOPR_Mask - set by caller
- This parameter is two bytes in length, and together with CUN4BOPR_Coll_Level
defines the collation rules. The default value is MASK_DEFAULT.
The
following table shows the format and description of the sub fields. Table 1. Collation mask sub fields descriptionsSub fields |
Description |
---|
CUN4BOPR_Variable_Opt |
This sub field specifies if operations with
variable collation elements must be performed. The options are:0 - Shifted (SHIFTED)
1 - Blanked (BLANKED)
2 - Non-Ignored (NIGNORED)
3 - Shift-Trimmed (STRIMMED)
4 - No Variable Behavior (NAVARIABLECE)
|
CUN4BOPR_Cmp_Order |
This sub field specifies following comparison
orders:0 - Forward (FORWARD) (Default)
1 - Backward (BACKWARD) (French behavior)
|
CUN4BOPR_SKey_Opt |
This sub field specifies either a comparison
or sort key:0 - No get sort key (SKOFF) and
perform binary comparison.(Default)
1 - Get sort key (SKON) and do not
perform binary comparison.
|
CUN4BOPR_Norm_Type |
This sub field specifies the normalization form
according to the following values:0 - No apply normalization (NNORM) (Default)
1 - Apply NFD (NFD)
2 - Apply NFC (NFC)
3 - Apply NFKD (NFKD)
4 - Apply NFKC (NFKC)
|
CUN4BOPR_GenSKey_and_Cmp |
Perform Binary comparison when Sort Key is also
requested.0 - Do not perform binary comparison (default)
1 - perform binary comparison
Note: This bit flag will
be meaningful if the following flags are set: - CUN4BOPR_Version = CUN4BOPR_Ver2
- CUN4BOPR_SKey_Opt = SKON
- CUN4BOPR_UCA_Ver = CUN4BOPR_UCA400R1 (or higher)
Collation version 3.0.1, was able to generate
either: - Perform Binary comparisons or
- Generate Sort Key
But not both.
From UCA400R1 and higher, its possible
to generate sort key and perform binary comparison at the same time.
|
- CUN4BOPR_RESULT - updated by service
Specifies the result of the binary comparison (between CUN4BOPR_Src1_Buf_Ptr
and CUN4BOPR_Src2_Buf_Ptr).
The results can be evaluated according
to the following values: -1 if CUN4BOPR_Src1_Buf_Ptr < CUN4BOPR_Src2_Buf_Ptr
0 if CUN4BOPR_Src1_Buf_Ptr = CUN4BOPR_Src2_Buf_Ptr
1 if CUN4BOPR_Src1_Buf_Ptr > CUN4BOPR_Src2_Buf_Ptr
- CUN4BOPR_RC_RS - set by service
- A structure that can be used to access CUN4BOPR_Return_Code and
CUN4BOPR_Reason_Code as one unit.
- CUN4BOPR_Return_Code - set by service
- Specifies the return code.
- CUN4BOPR_Reason_Code - set by service
- Specifies the reason code.
- CUN4BOPR_UCA_VER - set by caller
- Specifies the Unicode Collation Algorithm version
(UCA) which also makes reference to the specific Unicode Standard
character suite.
Note: This field will be referenced if Collation Parameter
Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2, otherwise its content
will be ignored.
- CUN4BOPR_Case_Options - set by caller
- Specifies CASE options.
- CUN4BOPR_Case_First - set by caller
- Specifies whether upper case characters collate before lower case
characters or not:
- 0 - Default (default value will depend on Locale. Most of the
locales use Lower First as default.)
- 1 - Upper First
- 2 - Lower First
- CUN4BOPR_Case_Options_Flags - set by caller
- Setting CUN4BOPR_Case_Level to ON and CUN4BOPR_Coll_Level = CUN4BOPR_PRIMARY
will ignore accent but not case:
- 0 - Default
- 1- Ignore accent but not under primary collation
Note: Those fields will be referenced if Collation Parameter
Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER
is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600,
otherwise its content will be ignored.
- CUN4BOPR_Special - set by caller
- CUN4BOPR_Hiragana - set by caller
- Specifies whether to distinguish between Japanese Hiragana and
Katakana characters.
- 0 - Do not distinguish (default)
- 1 - Conform to the Japanese JIS X 4061 standard and use the CUN4BOPR_Coll_Level
= CUN4BOPR_QUATERNARY collation.
Note: This field will be referenced if Collation Parameter
Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER
is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600,
otherwise its content will be ignored.
- CUN4BOPR_Var_Top - set by caller
Specifies the "highest" character (in UCA order) weight that
is to be considered ignorable. The Variable Top attribute is only
meaningful if the CUN4BOPR_Variable_Opt attribute is not set to Non-Ignored
(NIGNORED). In such case, it controls which characters count as ignorable.
For
example, if callers want white-space to be ignorable but not any visible
characters, they would use the value CUN4BOPR_Var_Top=X'0020' (space).
All characters of the same primary weight are equivalent, so CUN4BOPR_Var_Top=X'3000' (ideographic
space) has the same effect as CUNBOPRM_Var_Top =X'0020'.
Note: - All valid Code Points must be under UTF-16 format.
- Those fields will be referenced if Collation Parameter
Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER
is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600,
otherwise its content will be ignored.
- CUN4BOPR_Locale - set by caller
- Specifies a locale, where specific Collation Rules
will modify any of the default Unicode Collation tables
specified (UCA400R1, UCA410, or UCA600. UCA301
does not support customization) and then Collation will
behave according to those rules. Locales are set when you specify
the following fields:
- CUN4BOPR_Locale_Language - set by caller
- Specify a language for desired locale.
- CUN4BOPR_Locale_Region - set by caller
- Specify a region for desired locale.
- CUN4BOPR_Locale_Variant - set by caller
- Specify a variant for desired locale.
Note: - For supported Locales settings (Language/Region/Variant), see Locales for collation and case support.
- If there is no Locale information, UCA version will be set as
default without any change.
- Those fields will be referenced if Collation Parameter
Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER
is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600,
otherwise its content will be ignored.
Unicode Locales repository data set name SYS1.SCUNLOCL
contains a set of locales documented in Locales for collation and case support.
All of those locales contain a section for Collation rules.
Users
might want to copy locales and modify them as needed and then provide
the locale name in CUN4BOPR_Locale sub-fields. Then you have to provide
CUN4BOPR_DSName and CUN4BOPR_Collation_Rules_Vol in case that you
want to load the locales with the Unicode dynamic capabilities. If
that locale (modified by the users) is already loaded in the Unicode
environment, there is no need to set data set and volume information.
The
following example (CUNENUSX) shows how a locale looks like: ******************************************************************
* Licensed Materials - Property of IBM *
* *
* "Restricted Materials of IBM" *
* *
* (C) Copyright IBM Corp. 2006 *
* *
* Status = HUN7730 *
* *
******************************************************************
<version $revision: 1.19 $ = default>
<collation>
<rules>
&\u0061\u0065
<<\u00E6
<<<\u00C6
</rules>
</collation>
</version $revision: 1.19 $>
For
further information about Locales, see Locales for collation and case support.
For
further information about Collation rules syntax, see
CUN4BOPR_Collation_Rules_File field description.
From Locales for collation and case support the value shown in Column 2 for the Collation API field CUN4BOPR_Collation_Keyword
is used for "short path". Based on that field values for locales purpose,
the following table shows some examples about how to get equivalencies
between "short path" and "long path" settings. Table 2. Equivalencies between short path and long path local
settingsCUN4BOPR_Collation_Keyword |
CUN4BOPR_Locale_Language |
CUN4BOPR_Locale_Region |
CUN4BOPR_Locale_Variant |
---|
LAF |
AF |
|
|
LAR_RBH |
AR |
BH |
|
LDE_RAT_VPREEURO |
DE |
AT |
PREEURO |
LZH_VPINYIN |
ZH |
|
PINYIN |
LEN_RUS_VPOSIX |
EN |
US |
POSIX |
Locales information for CUN4BOPR_Collation_Keyword
has the following prefixes: - Lxx - For Language
- Ryy - For Region
- Vzz - For Variant
For CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region and
CUN4BOPR_Locale_Variant, you can use exactly the same values but without
the prefixes L, R or V.
Note: IBM® does
not recommend to use CUN4BOPR_Locale directly, instead of that, use
sub-fields CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region or CUN4BOPR_Locale_Variant.
- CUN4BOPR_Collation_Keyword - set by caller
- Specifies the "short path" settings form compatible with International
Components for Unicode (ICU). IBM suggests you use this field
instead of the "long path" settings for Collation callers
for UCA400R1, UCA410, and UCA600 versions in
the Collation API. This field can be set according
the following table:
Table 3. Collation keywords
descriptionsAttribute Name |
Key |
Possible Values |
Description |
---|
Locale |
L R V |
<locale> |
Provide a specific locale for collation rules
which are in SYS1.SCUNLOCL repository. For Locales supported, see Locales for collation and case support.
Where "Attribute Name" has the
following format:
Lxx_Ryy_Vzz,
where: - L means language
- R means region
- V means variant
Example: UCA400R1_LSV (Swedish) "Kypper" < "Köpfe"
For
long path equivalent setting, see CUNBOPRM_Locale description.
|
Strength |
S |
1, 2, 3, 4, I, D |
The Strength attribute determines whether
accents or case are taken into account when collating or matching
text (In UCA this is named Collation Levels. See CUNBOPRM_Coll_Level
description).
Example: UCA400R1_S1 role = Role = rôle
UCA400R1_S2 role = Role < rôle
UCA400R1_S3 role < Role < rôle
For long path
equivalent setting, see CUNBOPRM_Coll_Level description.
|
Case_Level |
K |
X, O, D |
The Case Level attribute is used when ignoring
accents but not case. In such case, set Strength to Primary, and Case_Level
to On.
In most locales, this setting is Off by default.
Example: UCA400R1_S1_KX role = Role = rôle
UCA400R1_S1_KO role = rôle < Role
For long
path equivalent setting, see CUNBOPRM_Case_Level description.
|
Case_First |
C |
X, L, U, D |
The Case First attribute is used to control
whether uppercase letters come before lowercase letters or vice versa
in the absence of other differences in the strings. The possible values
are Upper Case First (U) and Lower Case First (L), plus the standard
Default and Off. There is almost no difference between the Off and
Lower Case First options in terms of results, so typically users will
not use Lower Case First but only Off or Upper Case First.
Example: UCA400R1_CX or UCA400R1_CL "china" < "China" < "denmark" < "Denmark"
UCA400R1_CU "China" < "china" < "Denmark" < "denmark"
For
long path equivalent setting, see CUNBOPRM_Case_First description.
|
Alternate |
A |
N, S, D |
The Alternate attribute is used to control
the handling of the so-called variable characters in the UCA: white-space,
punctuation and symbols. If Alternate is set to Non-Ignorable (N),
then differences among these characters are of the same importance
as differences among letters.
If Alternate is set to Shifted
(S), then these characters are of only minor importance. The Shifted
value is often used in combination with Strength set to Quaternary.
In such case, white-space, punctuation, and symbols are considered
when comparing strings, but only if all other aspects of the strings
(base letters, accents, and case) are identical.
If Alternate
is not set to Shifted, then there is no difference between a Strength
of 3 and a Strength of 4.
For more information and examples,
see Variable_Weighting in the UCA. The reason the Alternate values
are not simply On and Off is that additional Alternate values may
be added in the future. The UCA option Blanked is expressed
with Strength set to 3, and Alternate set to Shifted.
Example: UCA400R1_S3_AN di Silva < Di Silva < diSilva < U.S.A. < USA
UCA400R1_S3_AS di Silva = diSilva < Di Silva < U.S.A. = USA
UCA400R1_S4_AS di Silva < diSilva < Di Silva < U.S.A. < USA
For
long path equivalent setting, see CUNBOPRM_Variable_Opt description.
|
Variable_Top |
T |
<hex digits> |
The Variable Top attribute is only meaningful
if the Alternate attribute is not set to Non-Ignorable. In such a
case, it controls which characters count as ignorable. The string
value specifies the "highest" character (in UCA order) weight that
is to be considered ignorable.
Thus, for example, if a user
wanted white-space to be ignorable, but not any visible characters,
then s/he would use the value Variable Top="\u0020" (space). All characters
of the same primary weight are equivalent, so Variable Top="\u3000"
(ideographic space) has the same effect as Variable_Top="\u0020".
Example: UCA400R1_S3_AN di Silva < diSilva < U.S.A. < USA
UCA400R1_S3_AS di Silva = diSilva < U.S.A. = USA
UCA400R1_S3_AS_T0020 di Silva = diSilva < U.S.A. = USA
For
long path equivalent setting, see CUNBOPRM_Var_Top description.
|
Normalization Checking |
N |
X, O, D |
The Normalization setting determines whether
text is thoroughly normalized or not in comparison (see also CUN4BOPR_Norm_Type).
Example: UCA400R1_NX ä= a + Ì% < ä+ Ì% < ¡+ Ì%
UCA400R1_NO ä= a + Ì% < ä+ Ì% < ¡+ Ì%
For
long path equivalent setting, see CUNBOPRM_Norm_Type description.
|
French |
F |
X, O, D |
The French sort strings with different accents
from the back of the string. This attribute is automatically set to
On for the French locales and a few others. Users normally would not
need to explicitly set this attribute. There is a string comparison
performance cost when it is set On, but sort key length is not affected
(see also CUN4BOPR_Cmp_Order).
Example: UCA400R1_FX cote < coté< côte < côté
UCA400R1_FO cote < côte< coté < côté
For
long path equivalent setting, see CUNBOPRM_Cmp_Order description.
|
Hiragana |
H |
X, O, D |
Compatibility with JIS x 4061 requires the
introduction of an additional level to distinguish Hiragana and Katakana
characters. If compatibility with that standard is required, then
this attribute should be set On, and the strength set to Quaternary.
This will affect sort key length and string comparison string comparison
performance.
Example: UCA400R1_HX_S4 M0...= -å< M0†= -0æ
UCA400R1_HO_S4 M0...< -å< M0†< -0æ
For
long path equivalent setting, see CUNBOPRM_Hiragana description.
|
Valid values for collation keywords are listed in the following
table: Table 4. Valid values for collation keywordsValue |
Abbreviation |
---|
Default |
D |
On |
O |
Off |
X |
Primary |
1 |
Secondary |
2 |
Tertiary |
3 |
Quaternary |
4 |
Identical |
I |
Shifted |
S |
Non-Ignorable |
N |
Lower-First |
L |
Upper-First |
U |
These abbreviations allow a 'short path settings'
specification of a set of collation options, such as "UCA400R1_AS_LSV_S2",
which can be used to specify that the desired options are:
UCA version 4.0.1; ignore spaces, punctuation and symbols; use Swedish
linguistic conventions; compare case-insensitively.
A number
of attribute values are common across different attributes; these
include Default (abbreviated as D), On (O), and Off (X).
This
form is compatible with ICU 3.2, however, the content of this short-set
form fields is mutually exclusively from current collation configuration
fields (long path settings), which means that this field will be the
first one to be analyzed prior current collation fields content sets.
Note: All
collation keywords sets must start with one of the following Collation versions followed by desired
sets: - * UCA400R1_...
- * UCA410_...
- * UCA600_...
If there is an invalid Keyword or invalid keyword value, Collation will return RC8/RS24 (CUN_RC_USER_ERR/
CUN_RS_INVALID_COLLATION_KEYWORD_VALUES). If some of the keywords
appear more than once, RC8/RS31 will be returned (CUN_RC_USER_ERR/
CUN_RS_OVERLAYING_COLLATION_KEYWORD).
- CUN4BOPR_DSName - set by caller
- Specifies the name of the alternative data set from where the
rules are to be loaded. It enables callers to load Locales from non-official
Unicode repository (SYS1.SCUNLOCL) or load User Collation Rules
Files from private data spaces as well (see CUN4BOPR_Collation_Rules_File).
- CUN4BOPR_Collation_Rules_File - set by caller
- Specifies member name where the alternative collation rules are.
You can use User Collation Rules (UCR) for full Collation customization environment. Those
files can be considered as a variation of Collation Rules
or Locales since both UCR and Locales follow exactly the same collation
syntax.
Collation rules can be redefined using
the following symbols: Table 5. Collation rule symbolsSymbol |
Example |
Description |
---|
< |
\u0061<\u0062 |
Identifies a primary (base letter) difference
between "a" and "b" |
<< |
\u0061<<\u00E4 |
Signifies a secondary (accent) difference between
"a" and "ä" |
<<< |
\u0061<<<\u0041 |
Identifies a tertiary difference between "a"
and "A" |
= |
x = y |
Signifies no difference between "x" and "y". Note: X
means CP x and Y means CP Y (x,y are not chars but CPs)
|
& |
&Z |
These rules will be relative to this letter,
but will not affect the position of Z itself. Note: Z means CP Z (Z
is not char but a CP)
|
/ |
æ/e |
Expansion. Add the collation element for 'e'
to the collation element for æ. After a reset "&ae << æ"
is equivalent to "&a << æ/e". |
| |
a|b |
Prefix processing. If 'b' is encountered and
it follows 'a', output the appropriate collation element. If 'b' follows
any other letter, output the normal collation element for 'b'. Collation element for 'a' is not affected. |
Also the following tags might be part of the Collation syntax rules (default values
are in BOLD and italic) as an easier way to set collation behavior: Table 6. Collation syntax rulesOption |
Example |
Description |
---|
... ... |
See CUNBOPRM_Locale parameter description field. |
Describes the start/end block of sets for a
locale. X.x and default denotes a locale revision/version, however,
Locales versions are not meaningful at this time. |
... ... |
Refer to your default Unicode locales repository
SYS1.SCUNLOCL and look for CUNAF locale. |
Describes the start/end block of sets for a
locale, where no revision and version are required, because default
UCA rules are part of this locale. |
... ... |
See the example that follows table "Collation syntax rules". |
Describes the start/end block of sets for a
User Collation Rules (UCR). Default denotes
an "UCR" version which is not meaningful at this time. |
Alternate |
[alternate non-ignorable]
[alternate shifted]
|
Sets the default value for Alternate attribute.
If set to shifted, variable code points will be ignored on the primary
level. |
Backwards |
[backwards 2] |
Sets the default value for Backwards attribute.
If set to on, secondary level will be reversed. |
Variable top |
& X < [variable top] |
Sets the default value for Variable Top attribute.
All the code points with primary strengths less than variable top
will be considered variable. |
Normalization Case Level |
[normalization off]
[normalization on]
|
Turns on or off the Normalization attribute.
If set to on, a quick check and necessary normalization will be performed. |
Case Level |
[caseLevel off]
[caseLevel on]
|
Turns on or off the Case Level attribute. If
set to on a level consisting only of case characteristics will be
inserted in front of tertiary level. To ignore accents but take cases
into account, set strength to primary and case level to on. |
Case First |
[caseFirst off]
[caseFirst upper]
[caseFirst lower]
|
Sets the value for Case First attribute. If
set to upper, causes upper case to sort before lower case. If set
to lower, lower case will sort before upper case. Useful for locales
that have already supported ordering but require different order of
cases. Affects case and tertiary levels. |
Strength |
[strength 1]
[strength 2] [strength 3]
[strength 4]
[strength 5]
[strength I]
|
Sets the default strength attribute. |
Hiragana |
[hiraganaQ off]
[hiraganaQ on]
|
Controls special treatment of Hiragana code
points on quaternary level. If turned on, Hiragana code points will
get lower values than all the other non-variable code points. Strength
must be greater or equal than quaternary if you want this attribute
to take effect. Set UCOE_HIRAGANAQ. |
[before 1|2|3] |
&[before 1] a<?<à<?<á? |
Enables users to order characters before a given
character. In UCA 3.0, the example is equivalent to &?<?<à<?<á?
(?= \u3029, Hangzhou numeral nine) * and makes accented 'a' letters
sort before 'a'. Accents are often used to indicate the intonations
in Pinyin. In this case, the non-accented letters sort after the accented
letters. |
[last non ignorable] |
&[last non ignorable]<\u4E9C |
Defines a list of CP's which will be positioned
right after [last non-ignorable] CP. |
[last regular] |
&[last regular]<\u4E9C |
Equivalent as [last non-ignorable] |
[suppressContractions [FromCP-ToCP]] |
&[suppressContractions [\u0400-\u045F]] |
Suppress all contraction defined in a range
defined by FromCP - ToCP. After this rule, all of them will be treated
as Normal CP's. |
[last secondary ignorable] |
&[last secondary ignorable]<<<\u0020 |
All CP's after [last secondary ignorable] will
be placed after last secondary ignorable CP. |
The following is an example which can be used as UCR
files: ******************************************************************
* Owner: My Name *
* Prof Description: User Collation Rules profile sample *
* *
* *
* *
* *
* *
* *
* *
* *
******************************************************************
<version $UCR$ = default>
<collation>
<rules>
[strength 1] * Collation Settings ...
[alternate non-ignorable]
[backwards 2]
[normalization on]
[caseLevel on]
[caseFirst off]
[hiraganaQ off]
&\u0061\u0065 * Modifying CPs
<<\u00E6
<<<\u00C6
&\u0062<\u0061
</rules>
</collation>
</version $UCR$ = default>
For Collation Rules Files or locales
files consider the following: - Use the asterisk "*" as a comment line, starting at column 1.
- Whatever collation settings must be specified inside of the tags <rules>
... </rules>.
- All collation tags and values are key sensitive. Use exact same
tags and UTF-16 CP format as specified in this topic.
- As part of code points, use the following UTF-16, that is, \u0061.
"\u" denotes a UTF-16 CP.
- Blanks are not allowed after each one of the following symbols:
For this new collation implementation (tailoring for UCA400R1
and higher - not available for UCA301), there are two ways to perform
collation settings in the Collation API. You must follow
the following order in case that more than one is specified in the Collation API. - Short path - This setting is based on the contents of CUN4BOPR_Collation_Keyword
For example, "UCA400R1_LEN_RUS_VPOSIX"
- Long path - This setting is used when some of the following fields
are set and values are followed according to its order in the following
list:
- CUN4BOPR_Coll_Level
- CUN4BOPR_Variable_Opt
- CUN4BOPR_Cmp_Order
- CUN4BOPR_SKey_Opt
- CUN4BOPR_Norm_Type
- CUN4BOPR_Case_First
- CUN4BOPR_Case_Level
- CUN4BOPR_Hiragana
- CUN4BOPR_Var_Top
- CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region or CUN4BOPR_Locale_Variant
- CUN4BOPR_Collation_Rules_File
Note: For long path settings, collation API fields like CUN4BOPR_Coll_Level
, CUN4BOPR_Variable ... CUN4BOPR_Var_Top overide any Collation settings
on Locales (CUN4BOPR_Locale) or UCR (CUN4BOPR_Collation_Rules_File).
- CUN4BOPR_Collation_Rules_Vol - set by service
- Specify the volume for data set specified by CUN4BOPR_DSName.
|