Single-source, dual-path version optimized for single-byte code sets
The term single-source dual-path refers to two paths in a single application where one of the paths is chosen at run time depending on the current locale setting, which indicates whether the code set in use is single-byte or multibyte.
If a program can retain its performance and not increase its executable file size too much, the single-source dual-path method is the preferred choice. You should evaluate the increase in the executable file size on a per command or utility basis.
In the single-source dual-path method, the MB_CUR_MAX macro specifies the maximum number of bytes in a multibyte character in the current locale. This should be used to determine at run time whether the processing path to be chosen is the single-byte or the multibyte path. Use a boolean flag to indicate the path to be chosen, for example:
int mbcodeset ;
/* After setlocale(LC_ALL,"") is done, determine the path to
** be chosen.
*/
if(MB_CUR_MAX == 1)
mbcodeset = 0;
else mbcodeset = 1;
This way, the current code set is checked to see if it is a multibyte
code set and if so, the flag mbcodeset
is set appropriately.
Testing this flag has less performance impact than testing the MB_CUR_MAX macro
several times.
if(mbcodeset){
/* Multibyte code sets (also supports single-byte
** code sets )
*/
/* Use multibyte or wide character processing
functions */
}else{
/* single-byte code sets */
/* Process accordingly */
}
The preceding approach is appropriate if globalization affects a small proportion of a module. Excessive tests for providing dual paths may degrade performance. Provide the test at a level that precludes frequent testing for this case.
The following version of the my_example
utility
produces one object, yet at run time, the appropriate path is chosen
based on the code set to optimize performance for that code set. Note
that we distinguish between single-byte and multibyte code sets only.
/*
* COMPONENT_NAME:
*
* FUNCTIONS: my_example
*
* The following code shows how to count the number of bytes and
* the number of characters in a text file.
*
* This example is for illustration purposes only. Performance
* improvements may still be possible.
*
*/
#include <stdio.h>
#include <ctype.h>
#include <locale.h>
#include <stdlib.h>
#include "my_example_msg.h"
#define MSGSTR(Num,Str) catgets(catd,MS_MY_EXAMPLE,Num,Str)
/*
* NAME: my_example
*
* FUNCTION: Counts the number of characters in a file.
*
*/
main(argc,argv)
int argc;
char **argv;
{
int bytesread, /* number of bytes read */
bytesprocessed;
int leftover;
int i;
int mbcnt; /* number of bytes in a character */
int f; /* File descriptor */
int mb_cur_max;
int bytect; /* name changed from charct... */
int charct; /* for real character count */
char *curp, *cure; /* current and end pointers into buffer */
char buf[BUFSIZ+1];
nl_catd catd;
wchar_t wc;
/* flag to indicate if current code set is a
** multibyte code set
*/
int multibytecodeset;
/* Obtain the current locale */
(void) setlocale(LC_ALL,"");
/* after setting the locale, open the message catalog */
catd = catopen(MF_MY_EXAMPLE,NL_CAT_LOCALE);
/* Parse the arguments if any */
/*
** Obtain the maximum number of bytes in a character in the
** current locale.
*/
mb_cur_max = MB_CUR_MAX;
if(mb_cur_max >1)
multibytecodeset = 1;
else
multibytecodeset = 0;
i = 1;
/* Open the specified file and issue error messages if any */
f = open(argv[i],0);
if(f<0){
fprintf(stderr,MSGSTR(CANTOPEN, /*MSG*/
"my_example: cannot open %s\n"), argv[i]); /*MSG*/
exit(2);
}
/* Initialize the variables for the count */
bytect = 0;
charct = 0;
/* Start count of bytes and characters */
leftover = 0;
if(multibytecodeset){
/* Full globalization */
/* Handles supported multibyte code sets */
for(;;) {
bytesread = read(f,buf+leftover,
BUFSIZ-leftover);
/* issue any error messages here, if needed */
if(bytesread <= 0)
break;
buf[leftover+bytesread] = '\0';
/* Protect partial reads */
bytect += bytesread;
curp=buf;
cure = buf + bytesread+leftover;
leftover=0; /* No more leftover */
for(; curp<cure ;){
/* Convert to wide character */
mbcnt= mbtowc(&wc, curp, mb_cur_max);
if(mbcnt <= 0){
mbcnt = 1;
}else if (cure - curp >=mb_cur_max){
wc = *curp;
mbcnt =1;
}else{
/* Needs more data */
leftover= cure - curp;
strcpy(buf, curp, leftover);
break;
}
curp +=mbcnt;
charct++;
}
}
}else {
/* Code specific to single-byte code sets that
** avoids conversion to widechars and thus optimizes
** performance for single-byte code sets.
*/
for(;;) {
bytesread = read(f,buf, BUFSIZ);
/* issue any error messages here, if needed */
if(bytesread <= 0)
break;
bytect += bytesread;
charct += bytesread;
}
}
/* print number of chars and bytes */
fprintf(stderr,MSGSTR(BYTECNT, "number of bytes:%d\n"),
bytect);
fprintf(stderr,MSGSTR(CHARCNT, "number of characters:%d\n"),
charct);
close(f);
exit(0);
}