Custom column size for text data

InfoSphere® MDM stores text data in the database using the CHAR or VARCHAR data type.

The column size for this data is measured in number of bytes. Because of the nature of UTF-8 encoding, one text character may be encoded by as many as 4 bytes.

In the worst case scenario, the column size created by the installation can only accommodate number of text characters that is ¼ of the column size. Therefore, after the database is installed, you should analyze the type of data you expect to store in the database and adjust the column size accordingly. You should do this on a case by case basis and do this according to the business requirement.

For example, the LAST_NAME column in the PARTYAME table is used to store a person's family name. By default, the column size for this field is 30. Therefore, in the worst case scenario, only seven text characters can be stored. In order to store 30 text characters, you will need to increase this column size by four times. However, the default column size is more than enough if you only plan to store persons with ethnic Han Chinese family names, since ethnic Han Chinese family names are limited to two text characters in length.

If you expect to store non-ASCII characters in InfoSphere MDM, you should increase the length of some derived fields.

Note: Typically, derived fields are prefixed with U, and are defined in the file named Insensitive_search_enabled.sql.

When non-case-sensitive searches are enabled for DB2® for Linux®, UNIX, and Windows, all searchable fields, such as SERVICE_ORG_NAME, have a corresponding derived field, such as USERVICE_ORG_NAME, where the value is stored in upper case to facilitate searches. By default, the original and derived fields are of the same length.

However, variable-length character encoding, such as UTF-8 case mappings, can produce strings of different lengths than the original. For example, a value stored in the searchable field, such as Eßen, may take up more bytes than the original when converted to upper case, such as ESSEN. As a result, the length of these derived fields must be increased if they will contain non-ASCII characters.