LDAP_CODEPAGE

Use the LDAP_CODEPAGE API to manage string conversions. It functions for managing the conversion of strings between UTF-8 and a local code page.

  • ldap_xlate_local_to_utf8
  • ldap_xlate_utf8_to_local
  • ldap_xlate_local_to_unicode
  • ldap_xlate_unicode_to_local
  • ldap_set_locale
  • ldap_get_locale
  • ldap_set_iconv_local_codepage
  • ldap_get_iconv_local_codepage
  • ldap_set_iconv_local_charset
  • ldap_char_size

Synopsis

#include ldap.h


int  ldap_xlate_local_to_utf8(
       char           *inbufp,
       unsigned long  *inlenp,
       char           *outbufp,
       unsigned long  *outlenp);

int  ldap_xlate_utf8_to_local(
       char           *inbufp,
       unsigned long  *inlenp,
       char           *outbufp,
       unsigned long  *outlenp);

int  ldap_xlate_local_to_unicode(
       char           *inbufp,
       unsigned long  *inlenp,
       char           *outbufp,
       unsigned long  *outlenp);

int  ldap_xlate_unicode_to_local(
       char           *inbufp,
       unsigned long  *inlenp,
       char           *outbufp,
       unsigned long  *outlenp);

int  ldap_set_locale(
       const char           *locale);

char *ldap_get_locale( )

int  ldap_set_iconv_local_codepage
       char           *codepage);

char *ldap_get_iconv_local_codepage( );

int  ldap_set_iconv_local_charset(
       char           *charset);

int  ldap_char_size(
       char           *p);

Input parameters

inbufp
A pointer to the address of the input buffer that contains the data to be translated.
inlenp
Length in bytes of the inbufp input buffer.
outbufp
A pointer to the address of the output buffer for translated data.
outlenp
Length in bytes of the outbufp input buffer.
Note: The output buffer must be three times as large as the input buffer if you want to translate the entire input buffer in a single call.
charset
Specifies the character set to be used when you convert strings between UTF-8 and the local code page. See IANA character sets supported by platform for the specific charset values that are supported for each operating system platform.
Note: The supported values for charset are the same values that are supported for the charset tag that is optionally defined in Version 1 LDIF files.
codepage
Specifies a code page or code set for overriding the active code page for the currently defined locale. See the system documentation for the code pages that are supported for a particular operating system.
locale
Specifies the locale to be used by LDAP when you convert to and from UTF-8 or Unicode. If the locale is not explicitly set, the LDAP library uses the default locale of the application. To force the LDAP library to use another locale, specify the appropriate locale string.
For applications that run on the Windows platform, supported locales are defined in ldaplocale.h. For example, the following code is an excerpt from ldaplocale.h and shows the available French locales:
/*      French - France                                 */
     #define LDAP_LOCALE_FRFR850             "Fr_FR" 
     #define LDAP_LOCALE_FRFRISO8859_1       "fr_FR" 

For applications that run on the AIX® operating system, see the locale definitions that are defined in the Understanding Locale section of AIX System Management Guide: Operating System and Devices. System-defined locales are in /usr/lib/nls/loc on the AIX operating system. For example, Fr_FR and fr_FR are two system-supported French locales.

For Solaris applications, see the system documentation for the set of system-supported locale definitions.

Note: The specified locale is applicable to all conversions by the LDAP library within the applications address space. The LDAP locale is set or changed only when there is no other LDAP activity that occurs within the application on other threads.
p
Returns the number of bytes constituting the character pointed to by p. For ASCII characters, it is 1. For other character sets, it can be greater than 1.

Output parameters

inbufp
A pointer to the address of the input buffer that contains the data to be translated
inlenp
Length in bytes of the inbufp input buffer
outbufp
A pointer to the address of the output buffer for translated data
outlenp
Length in bytes of the outbufp input buffer
Note: The output buffer must be three times as large as the input buffer if you want to translate the entire input buffer in a single call.
locale
When returned from the ldap_get_locale() API, locale specifies the currently active locale for LDAP. See the system documentation for the locales that are supported for a particular operating system. For applications that run in the Windows environment, see ldaplocale.h.
codepage
When returned from ldap_get_iconv_local_codepage() API, code page specifies the currently active code page, as associated with the currently active locale. See the system documentation for the code pages that are supported for a particular operating system.

Usage

These routines are used to manage application-level conversion of data between the local code page and UTF-8. It is used by LDAP when it communicates with an LDAP V3 compliant server. For more information about the UTF-8 standard, see "UTF-8, a Transformation Format of ISO 10646".

When connected to an LDAP V3 server, the LDAP APIs accept and return string data UTF-8 encoded, which is the default mode of operation. Alternatively, your application can rely on the LDAP library to convert LDAP V3 string data to and from UTF-8 by using the ldap_set_option() API to set the LDAP_OPT_UTF8_IO option to LDAP_UTF8_XLATE_ON. When set, the following connection-based APIs that accept a ld as input, expect string data to be supplied as input in the local code page. They return string data to the application in the local code page. In other words, the following LDAP routines and related APIs automatically convert string data to and from the UTF-8 wire protocol:
  • ldap_add (and family)
  • ldap_bind (and family)
  • ldap_compare (and family)
  • ldap_delete (and family)
  • ldap_parse_reference
  • ldap_get_dn
  • ldap_get_values
  • ldap_modify (and family)
  • ldap_parse_result
  • ldap_rename (and family)
  • ldap_search (and family)
  • ldap_url_search (and family)
The following APIs are not associated with a connection, and always expect string data, for example, DNs, to be supplied and returned UTF-8 encoded:
  • ldap_explode_dn
  • ldap_explode_dns
  • ldap_explode_rdn
  • ldap_server_locate
  • ldap_server_conf_save
  • ldap_is_ldap_url
  • ldap_url_parse
  • ldap_default_dn_set
The APIs convert your application data to and from the locale code page. There are several reasons for using these APIs:
  • The application is using one or more of the non-connection oriented APIs. It requires to convert strings to UTF-8 from the local code page before you use the APIs.
  • The application is designed to send and receive strings as UTF-8 when it uses the LDAP APIs. But it requires to convert selected strings to the local code page before you present to the user. When the directory contains heterogeneous data, that is, data is obtained from multiple countries, or locales, it might be the required approach.

These routines are used to manage application-level conversion of data between the local code page and UTF-8. It is used by LDAP when it communicates with an LDAP V3 compliant server. For more information about the UTF-8 standard, see "UTF-8, a Transformation Format of ISO 10646".

When connected to an LDAP V3 server, the LDAP APIs accept and return string data UTF-8 encoded, which is the default mode of operation. Alternatively, your application can rely on the LDAP library to convert LDAP V3 string data to and from UTF-8 by using the ldap_set_option() API to set the LDAP_OPT_UTF8_IO option to LDAP_UTF8_XLATE_ON. When set, the following connection-based APIs that accept a ld as input, expect string data to be supplied as input in the local code page. They return string data to the application in the local code page. In other words, the following LDAP routines and related APIs automatically convert string data to and from the UTF-8 wire protocol:
  • ldap_add (and family)
  • ldap_bind (and family)
  • ldap_compare (and family)
  • ldap_delete (and family)
  • ldap_parse_reference
  • ldap_get_dn
  • ldap_get_values
  • ldap_modify (and family)
  • ldap_parse_result
  • ldap_rename (and family)
  • ldap_search (and family)
  • ldap_url_search (and family)
The following APIs are not associated with a connection, and always expect string data, for example, DNs, to be supplied and returned UTF-8 encoded:
  • ldap_explode_dn
  • ldap_explode_dns
  • ldap_explode_rdn
  • ldap_server_locate
  • ldap_server_conf_save
  • ldap_is_ldap_url
  • ldap_url_parse
  • ldap_default_dn_set
The APIs convert your application data to and from the locale code page. There are several reasons for using these APIs:
  • The application is using one or more of the non-connection oriented APIs. It requires to convert strings to UTF-8 from the local code page before it uses the APIs.
  • The application is designed to send and receive strings as UTF-8 when you use the LDAP APIs. But it requires to convert selected strings to the local code page before it presents to the user. When the directory contains heterogeneous data, that is, data is obtained from multiple countries, or locales, it might be the required approach.
If your application might be extracting string data from the directory that originated from other countries or locales, design the application with the following considerations:
  • Consider splitting your application into a presentation component, and an LDAP worker component.
    • The presentation component is responsible for obtaining data from external sources. For example, graphical user interfaces (GUIs), command-lines, files, and displaying the data to a GUI, standard out, files. This component typically deals with string data that is represented in the local code page.
    • The LDAP worker component is responsible for interfacing directly with the LDAP programming interfaces. The LDAP worker component can be implemented to deal strictly in UTF-8 when you handle string data. The default mode of operation for the LDAP library is to handle strings that are encoded as UTF-8.
    • String conversion between UTF-8 and the local code page occurs when data is passed to and from the presentation component and the LDAP worker component.
    Consider the following scenario:

    The LDAP worker component issues an LDAP search, and returns a list of entries from the directory. To ensure that no data is lost, the default mode is used and the LDAP library does not convert string data. In this case, it means the DNs of the entries that are returned from the search are represented in UTF-8.

    The application wants to display this list of DNs on a panel. This display can help the user to select the required entry, and the application then retrieves more attributes for the selected DN. Since the DN is represented in UTF-8, it must be converted to the local code page before display.

    The converted DN might not be a faithful representation of the UTF-8 DN. For example, if the DN was created in China, it can contain Chinese characters. If the application is running in a French locale, certain Chinese characters might not be converted correctly, and are replaced with a replacement character.

    The application can display the converted DN, but certain characters might be displayed as bobs. Assuming there is enough information for the user to select the wanted DN, the application accesses the LDAP directory with the selected DN for more information. For example, a jpeg image so it can display the user photograph. Since jpeg images might be large, the application is designed to obtain the jpeg attribute after the user selects the specific DN only.

    Ensure that the search gets the jpeg attribute by using the selected DN to work. The search must be done with the original UTF-8 version of the selected DN. The search must not be done with the version of the DN that converted to the local code page. This action implies that the application maintains a correlation between the original DN UTF-8 version, and the version that converted to the local code page.

  • If the application accepts user input, generate one or more LDAP searches, then display the information without passing the results back into the LDAP library. The application can be designed to provide the LDAP library run the conversions, even though some data loss might theoretically occur. Automatic conversion of string data for a specific ld can be enabled by using ldap_set_option() with the LDAP_OPT_UTF8_IO option set to LDAP_UTF8_XLATE_ON.

ldap_char_size returns the number of bytes constituting the character pointed to by p. For ASCII characters, it is 1. For other character sets, it can be greater than 1.

Translate local code page to UTF-8

The ldap_xlate_local_to_utf8() API is used to convert a string from the local code page to a UTF-8 encoding. The output string from the conversion process can be larger than the input string. Therefore, the output buffer must be at least twice as large as the input buffer. LDAP_SUCCESS is returned if the conversion is successful.

Translate UTF-8 to local code page
The ldap_xlate_utf8_to_local() API is used to convert a UTF-8 encoded string to the local code page encoding. The output string from the conversion process can be larger than the input string. Therefore, the output buffer must be at least twice as large as the input buffer. LDAP_SUCCESS is returned if the conversion is successful.
Note: Translation of strings from a UTF-8 encoding to local code page can result in loss of data. This loss is possible when one or more characters in the UTF-8 encoding cannot be represented in the local code page. When this translation occurs, a substitution character replaces any UTF-8 characters that cannot be converted to the local code page.
Translate local code page to unicode

The ldap_xlate_local_to_unicode() API is used to convert a string from the local code page to the UCS-2 encoding as defined by ISO/IEC 10646-1. This same set of characters is also defined in the UNICODE standard. The output string from the conversion process can be larger than the input string. Therefore, the output buffer must be at least twice as large as the input buffer. LDAP_SUCCESS is returned if the conversion is successful.

Translate unicode to local code page
The ldap_xlate_unicode_to_local() API is used to convert a UCS-2-encoded string to the local code page encoding. The output string from the conversion process can be larger than the input string. Therefore, the output buffer must be at least twice as large as the input buffer. LDAP_SUCCESS is returned if the conversion is successful.
Note: Translation of strings from a UCS-2 (UNICODE) encoding to local code page can result in loss of data. This loss is possible when one or more characters in the UCS-2 encoding cannot be represented in the local code page. When this translation occurs, a substitution character replaces any UCS-2 characters that cannot be converted to the local code page.
Set locale

The ldap_set_locale() API is used to change the locale that is used by LDAP for conversions between the local code page and UTF-8 (or Unicode). Unless explicitly set with the ldap_set_locale() API, LDAP uses the default locale of the application. To force the LDAP library to use another locale, specify the appropriate locale string. For UNIX systems, see the system documentation for the locale definitions. For Windows operating systems, see ldaplocale.h.

Get locale

The ldap_get_locale() API is used to obtain the active LDAP locale. Values that can be returned are system-specific.

Set code page

The ldap_set_iconv_local_codepage() API is used to override the code page that is associated with the active locale. See the system documentation for the code pages that are supported for a particular operating system.

Get code page

The ldap_get_iconv_local_codepage() API is used to obtain the code page that is associated with the active locale. See the system documentation for the code pages that are supported for a particular operating system. See IANA character sets supported by platform for the specific charset values that are supported for each operating system platform. The supported values for charset are the same values that are supported for the charset tag that is optionally defined in Version 1 LDIF files.

Japanese and Korean currency considerations
The generally accepted convention for converting the backslash character (\) (single-byte X'5C') from the Japanese or Korean locale into Unicode is to convert X'5C' character to the following considerations:
  • the Unicode yen for Japanese
  • the Unicode won for Korean

To change the default behavior, set the LDAP_BACKSLASH environment variable to YES before you use any of the LDAP APIs. When LDAP_BACKSLASH is set to YES, the X'5C' character is converted to the Unicode (\), instead of the Japanese yen or Korean won.

Errors

Each of the LDAP user configuration APIs returns a nonzero LDAP return code if an error occurs. See LDAP_ERROR for more details.

See also

ldap_error