mblen() — Calculate length of multibyte character

Standards

Standards / Extensions C or C++ Dependencies
ISO C
XPG4
XPG4.2
C99
Single UNIX Specification, Version 3
both  

Format

#include <stdlib.h>

int mblen(const char *string, size_t n);

General description

Determines the length in bytes of the multibyte character pointed to by string. A maximum of n bytes is examined.

The behavior of this wide-character function is affected by the LC_CTYPE category of the current locale. Changing the LC_CTYPE category invalidates the internal shift state: undefined results can occur.

If the current locale supports EBCDIC DBCS characters, then the shift state is updated where applicable. (See “Conforming to ANSI Standards” in z/OS XL C/C++ Language Reference.) The length returned may be up to 4 (for the shift-out character, 2-byte code, and the shift-in character). If string is a NULL pointer, this function resets itself to the initial state.

The function maintains the internal shift state that is altered by subsequent calls.

Returned value

If string is NULL, mblen() returns:
  • Nonzero when DBCS-host code (EBCDIC systems) is used
  • Nonzero if multibyte encodings are state-dependent
  • Zero otherwise
If string is not NULL, mblen() returns:
  • Zero if string points to the NULL character
  • The number of bytes comprising the multibyte character
  • The value -1 if string does not point to a valid multibyte character

Example

 #include <locale.h>
 #include <stdlib.h>
 #include <stdio.h>

 int main(void)
 {
     char *mbs = "a"
                 "\x0E"      /* shift out */
                 "\x44\x66"  /* <j0158> */
                 "\x44\x76"  /* <j0159> */
                 "\x42\x4e"  /* <j0160> */
                 "\x0F"      /* shift in */
                 "b";
     char *loc = setlocale(LC_ALL, "JA_JP.IBM-939");
     int n;

     if (!loc)   /* setlocale() failure */
     {
         exit(8);
     }

     printf("We're in the %s locale.\n", loc);

     n = mblen(NULL, MB_CUR_MAX);
     /******************************************************************/
     /* n is nonzero, indicating state-dependent encoding; mblen() has */
     /* forced the internal shift state to "initial".                  */
     /******************************************************************/
     printf("n = mblen(NULL, MB_CUR_MAX);     ===> n = %s\n",
         n ? "NONZERO" : "ZERO");

     n = mblen(mbs, MB_CUR_MAX);
     /******************************************************************/
     /* n is 1, 'a' is a multibyte character of length 1, internal     */
     /* shift state remains at "initial".                              */
     /******************************************************************/
     printf("n = mblen(mbs, MB_CUR_MAX);      ===> n = %d\n", n);

     n = mblen(mbs + 1, MB_CUR_MAX);
     /******************************************************************/
     /* n is 3, 'shift out' plus two byte character '<j0158>'.  The    */
     /* internal state changes to "shift out".                         */
     /******************************************************************/
     printf("n = mblen(mbs + 1, MB_CUR_MAX);  ===> n = %d\n", n);

     n = mblen(mbs + 4, MB_CUR_MAX);
     /******************************************************************/
     /* n is 2, two byte character '<j0159>'.  The internal shift      */
     /* state remains "shift out"                                      */
     /******************************************************************/
     printf("n = mblen(mbs + 4, MB_CUR_MAX);  ===> n = %d\n", n);

     n = mblen(mbs + 6, MB_CUR_MAX);
     /******************************************************************/
     /* n is 3, two byte character '<j0160>' plus 'shift in'.  The     */
     /* internal shift state returns to "initial".                     */
     /******************************************************************/
     printf("n = mblen(mbs + 6, MB_CUR_MAX);  ===> n = %d\n", n);

     n = mblen(mbs + 9, MB_CUR_MAX);
     /******************************************************************/
     /* n is 1, 'b' is a multibyte character of length 1, internal     */
     /* shift state remains at "initial".                              */
     /******************************************************************/
     printf("n = mblen(mbs + 9, MB_CUR_MAX);  ===> n = %d\n", n);

     n = mblen(mbs + 10, MB_CUR_MAX);
     /******************************************************************/
     /* n is 0 (end of multibyte character string).                    */
     /******************************************************************/
     printf("n = mblen(mbs + 10, MB_CUR_MAX); ===> n = %d\n", n);

     return 0;
 }
Output
 We're in the JA_JP.IBM-939 locale.
 n = mblen(NULL, MB_CUR_MAX);     ===> n = NONZERO
 n = mblen(mbs, MB_CUR_MAX);      ===> n = 1
 n = mblen(mbs + 1, MB_CUR_MAX);  ===> n = 3
 n = mblen(mbs + 4, MB_CUR_MAX);  ===> n = 2
 n = mblen(mbs + 6, MB_CUR_MAX);  ===> n = 3
 n = mblen(mbs + 9, MB_CUR_MAX);  ===> n = 1
 n = mblen(mbs + 10, MB_CUR_MAX); ===> n = 0

Related information