String externalization practices and considerations for UNIX shell scripts

Enable your product's shell scripts for global audience

In this article, we provide practical "How-Tos" and experiences on externalizing shell script messages in a product. Also, we provide suggestions on what to consider before and during translation enablement from a globalization perspective.
The target audience is product developers who would like to enable their shell scripts for translation. After reading this article, readers can understand the considerations for externalizing shell script messages, realize the end-to-end process of string extraction and translation, and be aware of some known issues and their solutions.

Naomi YM Wu (naomiwu@tw.ibm.com), Software Engineer, IBM

Naomi WuNaomi works in IBM as a globalization testing lead for 5 years. She has led the globalization verification test (GVT) projects across various IBM products lines such as ECM, Rational, and Industry Solution. Naomi has also been working on Globalization Assessment and Globalization Enablement projects as a globalization engineer, and translatability enablement is one of her main focus.



Andy MT Wu (andymt@tw.ibm.com), Software Engineer, IBM

Andy WuAndy works in IBM as a software developer and a project manager for 10 years. He experienced cross brand projects including IBM WebSphere, IBM Tivoli, Business Analytics, Industry Solutions, and Systems and Technlogy Group. He is proficient in globalization enablement, testing, and service-oriented architecture (SOA) development.



Zach TL Lee (zachlee@tw.ibm.com), Software Engineer, IBM

Zach LeeZach is a globalization testing lead from Globalization Shared Services Center (GSSC), CDL, IBM. He is responsible for GVT and translatability verification test (TVT) for IBM Tivoli storage products. Zach has also been working on Globalization Assessment and Globalization Enablement projects as a globalization engineer, focussing on globalization enablement and translatability enablement.



24 December 2013

Also available in Russian

1. Why enable translations for shell environments?

Shell scripts contain the sequence of commands to run tasks automatically on UNIX®-like operating systems, and this is essential for system administrators and developers. However, shell scripts are usually written to display English messages only, which might not be friendly for non-English users. Enabling translation for your product's shell scripts can benefit non-English users worldwide. Table 1 defines some common used terms in this article.

Table 1. Terminology
TermDefinition
String externalization / String extraction The process of separating translatable messages from the source code for translation.
Locale A setting that identifies language or geography and determines user preferences such as language of messages, cultural format, and so on.
Globalization (G11n) The provision of a single software solution that has multicultural support, and is available in one or more languages.
Program Integrated Information (PII) User-visible text that is contained within a software program and is integral to the execution of that program. This includes user interface text and messages.

1.1 String externalization approach

Before translators translate the messages, the developers need to enable software translatability. GNU gettext is a common library to retrieve the translation. It provides a framework with a set of tools to achieve multilingual supports. The implementations include Java, C/C++, shell script, Perl and many other programming languages. . In this article, we describe how to use gettext to externalize the PII string from the shell script.

1.2 String externalization considerations

Message ID: Utilize English message or unique ID?

One English word or phrase might have multiple meaning or part of speech. It might have different meaning and the corresponding translations in other languages. Because only one message ID can exist in one PII file, using English message as ID might cause all contexts to share the single translation. Instead, using a unique ID can allow the translator to apply different translations for each appearance.

Hard-coded messages

Developers used to include English messages in the source code. Separating the translatable messages from the executable code is the fundamental rule for translatability. It allows translators to add new supporting languages or modify the existing translation files without modifying the source code.

Concatenation

Translatable messages must be complete sentences. Constructing sentences from the fragmented PII messages enforces the translation to follow the same order. It may violate the grammar for some languages. Embedding variables in the message sentence helps the translators to rearrange the word order.

Multiple paragraphs

Avoid using multiple paragraphs in a single PII message. The suggested approach is to limit one paragraph in a PII message. This is because a PII message is the minimal translation unit, and if there are multiple paragraphs in one message, the translators might need to proofread the whole PII message before re-translating just because one English word changed, which is time-consuming.


2. Shell script translation enablement steps

In this section, we demonstrate the detailed steps to enable translation for shell scripts. It starts from string extraction and ends with translation integration. For the string extraction phase, GNU gettext commands and Portable object (.po) files are used to make the messages translatable.

As discussed in section String externalization considerations, it has some debates on whether to use an English message or a unique ID as the PII message identifier. The two approaches have different advantages and disadvantages. However, in order to fulfill the G11n requirement of context-aware translation for same English message, we took the alternative approach of gettext to mark the PII message identifier with a unique ID, instead of an English message. By this approach, messages could be uniquely identified under different contexts.

In the steps below, we assume the scenario that you are the developer responsible to enable an existing product to a translated product, which only supports English for now. In such scenario, the end-to-end process of source code preparation, string externalization, and translation integration are covered.

Enabling translation for shell scripts involves the following five major stages.

  1. Get shell scripts ready
  2. Extract translatable messages
  3. Generate translatable message files
  4. Perform translation on message files
  5. Package translated message files into build

More details are explained in the following sub-sections.

2.1 Get shell scripts ready

Before starting to extract your shell script messages, review your source codes and ensure that your to-be-translated messages does not contain the following formats that the GNU gettext manual mentioned: (For more details, refer to the Resources section.)

  1. Access to arguments (for example: $0, $1, …)
  2. Highly volatile shell variables (for example: $? )
  3. Command substitution (for example: "`...`" or "$(...)")
  4. Variable access with defaulting (for example: ${variable-default})

If your message contains such formats, you must reorganize your code so that the to-be-translated message does not contain these formats. Here is a simple example.

Modify

echo "Usage: $0 [OPTION] FILE..."

to

name=$0
echo "Usage $name [OPTION] FILE..."

After reviewing your source codes, the next step is to allow shell scripts to use the gettext commands, which can be done by adding a declaration at the top of each shell script file.

First, we include '.gettext.sh' to use commands such as gettext and eval_gettext. An example is shown in Figure 1.

Next, set TEXTDOMAIN and TEXTDOMAINDIR to identify the location of translated Machine object (.mo) file. The Machine object files are in binary format that are generated from the .po files, to be readable by program instead of human. Note that TEXTDOMAIN and TEXTDOMAINDIR are used to identify the translation file name and file path, respectively (as shown in Figure 2).

Figure 1. "gettext" declaration
Gettext declaration
Figure 2. Translation file location declaration
Translation file location declaration

2.2 Extract translatable messages

After preparing the shell script file, you can now start to extract the translatable messages by using the gettext and eval_gettext commands.

For messages without variables inside, you can use the gettext command to replace the English message with a unique message ID, and the command format is:

gettext "<msgid>"; echo

Here, the naming rule of <msgid> is to combine the .sh file name abbreviation and a brief description of the message itself. The advantages of this rule are:

  1. The file name information allows developers to quickly understand from where this message has originated.
  2. The naming maintained some degree of readability comparing to serial number style IDs. Listing 1 shows an example of the gettext command.

Listing 1. Example of the gettext usage

<Before>

echo "This program will use following command to
install the EGO RPM packages on the system."

<After>

gettext "Platform_EGO_install_msg"; echo

During the process of transforming echo to gettext, it is recommended to maintain a mapping list of the message ID and the original English message for tracking purpose. In this way, you can review the mapping list to find whether there are duplicated message IDs, or to merge messages of same content (but ensure that they will not have different context translation under all target languages) to a single message ID. And, you don't need to go through all files to find out a specific message entry. This might be helpful when preparing English translation resource in the Generate translatable message files section.

In case you see a message that contains one or more variables, you need to use the eval_gettext command instead. Because the unique ID approach is used here, you can simply replace the original string with the unique message ID. Listing 2 shows an example.

Listing 2. Example of the eval_gettext usage

<Before>

echo "The host is $BINARY_TYPE"

<After>

eval_gettext "Platform_binary_type_msg"; echo

However, if you prefer to use the English message approach, that is, use the English message as your message ID, you need to add escape characters before $ (as shown in Listing 3).

Listing 3. Example of the eval_gettext usage (using the English message as the ID)

<Before>

echo "The host is $BINARY_TYPE"

<After>

eval_gettext "The host is \$BINARY_TYPE"; echo

As mentioned in the Get shell scripts ready section, in some cases, the message need to be reorganized. The argument variable need to be extracted out of the eval_gettext command (as shown in Listing 4).

Listing 4. Example to reorganize argument variables

<Before>

echo "Port numbers ($param_name=$1) must be integers."

<After>

port=$1
eval_gettext "Install_cmn_port_int_error"; echo

After externalizing all your translatable messages, you are now ready to generate the .po files with the xgettext command.

xgettext <filename.sh>

If you want to process multiple shell scripts in a batch, list all the file names in a text file and run the following command.

xgettext -f <filename-list.txt>

This command would be quite handy when you have many shell script files and want to combine messages from several shell script files into a single .po file. An example filename-list.txt is shown in Listing 5.

Listing 5. Filename-list example
./sym-wrapper-header.sh
./platform.sh
./instlib/install_common.sh
./instlib/post-lsf.sh
./instlib/post-ego.sh

After extracting messages using the xgettext command, an output .po file containing a unique message ID (msgid) and empty message strings(msgstr) will be generated. Listing 6 shows an example output file, messages.po.

Listing 6. Example messages.po file generated by the xgettext command
#: instlib/post-lsf.sh:452
#, sh-format
msgid "Post_lsf_installed_msg"
msgstr ""

#: instlib/post-lsf.sh:458
msgid "Post_lsf_mgmt_host_msg"
msgstr ""

#: instlib/post-ego.sh:93
#, sh-format
msgid "Post_ego_remove_success_msg"
msgstr ""

#: instlib/post-ego.sh:96
#, sh-format
msgid "Post_ego_install_success_msg"
msgstr ""

Until now, we have completed the string extraction phase. Next, the extracted message file must be prepared as translatable message files and then sent for translation.

2.3 Generate translatable message files

To prepare translatable message files, steps are slightly different between the unique ID approach and the English message approach. Note that, in the English message approach, there is no need to create the English message file as the message ID itself is using English translation. This approach treats English as the default language. However, in the unique ID approach, every language is treated as a translation language. Therefore, you need to create an English message file to allow the program to convert the unique ID to its corresponding message under the English locale. The preparation step is straight forward. You need to open the messages.po file generated by xgettext, which now contains the message ID (msgid) only. Then, fill in the empty message string (msgstr) entries with the corresponding English messages, and save it as another copy. You can use the mapping list maintained during the string externalization phase. When completed, you will get the messages.po (we call it messages_en.po to avoid confusion to message.po) file, as shown in Listing 7.

Listing 7. English translation resource example
#: instlib/post-lsf.sh:452
#, sh-format
msgid "Post_lsf_installed_msg"
msgstr ""
"Platform LSF $prdversion is installed at $topdir.\n"
"To make LSF take effect, you must set your environment on this host: \n"
"source ${topdir}/cshrc.platform \n"
"or \n"
". ${topdir}/profile.platform "

#: instlib/post-lsf.sh:458
msgid "Post_lsf_mgmt_host_msg"
msgstr ""
"This is a management host. To complete the installation on this host, you "
"must run: \n"
"egoconfig mghost lsf"

#: instlib/post-ego.sh:93
#, sh-format
msgid "Post_ego_remove_success_msg"
msgstr ""
"Platform EGO version $_ego_version is successfully removed from RPM database."

#: instlib/post-ego.sh:96
#, sh-format
msgid "Post_ego_install_success_msg"
msgstr "Platform EGO version $egoversion is successfully installed."

2.4 Perform translation on message files

At this stage, the message ID file (messages.po) and the English message file (messages_en.po) are ready to be sent for translation. The actual translation process is out of this article's scope and is not discussed here.

After the translation phase, you will get a set of translated message files (messages.po) from all target languages.

Also, you have one English message file generated by yourself (when the unique ID approach is used).

2.5 Package translated message files into build

Now, all the translated message files are available., The last phase is to integrate the translated files back into the product. First, you need to put all the translated .po files under a proper file structure. The correct file structure is:

$TEXTDOMAINDIR/<locale>/LC_MESSAGES/

The <locale> here should follow the naming rule of <"Language code"_"Country code">. For a full list of language code and country code, refer to the Resources section.

And, $TEXTDOMAINDIR is what we defined at the beginning of each .sh file. For example, if you set $TEXTDOMAINDIR as /opt/ego/nls, and the translation file is for Traditional Chinese, you should put it under /opt/ego/nls/zh_TW/LC_MESSAGES/. Also, ensure that all .po files are using the same name of $TEXTDOMAIN.po, so that shell scripts can identify the translation files correctly. Figure 3 shows an example of well-organized message files.

Figure 3. Example of a translated message file structure
Example of a translated message file structure

Then, in order for the program to read the translation, you need to generate the .mo file from each .po file using the following command.

msgfmt messages.po

After the msgfmt command, the output file, messages.mo, is generated. Machine object files should be located in the same file structure as that of the .po files. You can refer to Figure 4 for an example.

Figure 4. Translated message file structure after the msgfmt command
Translated message file structure after the msgfmt command

About the timing of transformation from the .po file to the .mo file, you can either manually run the msgfmt command before code check-in or merge this step into your build script. The recommended practice is the latter one. By automating the msgfmt step, not only manual effort is eliminated, but you can also ensure that the latest translation is always available in the build. You can refer to the example build script in Listing 8.

Listing 8. Build script example to integrate the msgfmt step into build

#Parse .po files under nls folder and compile to .mo files.
find ../nls -name '*.po' | while read -r file; do
PODIR=`dirname $file`; POFILE=`basename $file`;
PONAME=`echo $POFILE | cut -d'.' -f1`; 
msgfmt -o $PODIR/$PONAME.mo $file; done

Now, your shell scripts are properly translated. The active translation displayed by the shell scripts will be determined by the LANG environment variable.


3. Common issues and solutions


This section contains some common issues and suggested solutions when enabling translation for shell scripts.

3.1 Line feed issue

Sometimes, user would see the following error message while using the msgfmt command to generate the .mo files.

Listing 9. Example of error messages for msgfmt

./en_US/LC_MESSAGES/messages.po:160: 'msgid' and 'msgstr' entries do not both begin with '\n'
./en_US/LC_MESSAGES/messages.po:172: 'msgid' and 'msgstr' entries do not both end with '\n'

There are two possibilities that can cause this issue.

  • Using the English message approach:
    If an English message is used as the message ID (msgid) and there is a line feed character at the beginning or end of the message, and when the message is translated to another language, the translator might remove the line-feed character on purpose or without intention. Under such a situation, an error (as shown in Listing 9) will be displayed because the line-feed character exists at the beginning or end of message ID only, but not in the message string.
  • Using the unique ID approach:
    If you use a unique ID as the message ID (msgid), and there is a line-feed character at the beginning or end of translation in the message string (msgstr), it will cause the error as well.

Therefore, the suggestion is to avoid line-feed characters at the beginning or end of messages in both msgid and msgstr fields. If a new line character is needed, simply move the line-feed character out of the .po files and insert the echo command in the shell script instead.

Figure 5. Example of removing “\n” from a .po file
Example of removing

Click to see larger image

Figure 5. Example of removing “\n” from a .po file

Example of removing

3.2 Locale fallback issue

Not all programs have translations for all languages. Using an English message as msgid, by default, shows the English message in place of a nonexistent translation. However, if a unique message ID is used, it will not fall back to English message on its own when the system is under non-translated locales. In that case, a unique message ID is displayed instead of an English message.

Figure 6. Display message ID incorrectly when translation not found
Display message ID incorrectly when translation not found

To deal with this problem, user need to set proper locale fallback mechanism to avoid displaying unique message IDs when no translation is found for the current locale. This is achieved through the environment variable called LANGUAGE. After adding the following code in Listing 10 at the beginning of each shell script, the translation can fall back to English when no translation is available for the current locale setting.


Listing 10. Example of setting locale fallback

# Setting locale fallback mechanism if no translation found
LANGUAGE=$LC_ALL:$LANG:en_US    export LANGUAGE

GNU gettext will look up the $LC_ALL value first and try to find out whether there is a locale directory that exactly matches the value. If no match is found, check the $LANG variable.. If can't locate the corresponding directory still, set LANGUAGE to en_US. In that way, the user can make sure that at least the English message is shown.

3.3 eval_gettext variable substitution issue

When string is represented by a unique string ID, the original GNU method, eval_gettext, will not be able to substitute the variable anymore. To solve this issue, the user need to overwrite eval_gettext, and put it in every shell script that need to be internationalized. Refer to the <Before> and <After> code change in Listing 11.

Listing 11. Example of variable substitution when using a unique ID

<Before>

eval_gettext () {
gettext "$1" | (export PATH `envsubst --variables "$1"`; envsubst "$1")
}

<After>

eval_gettext () {
_tmp=`gettext "$1"`; 
gettext "$1" | (export PATH `envsubst --variables "$_tmp"`; envsubst "$_tmp");
unset _tmp }

3.4 Special case - installer component

The installer is usually a compressed file containing all folders and files as an executable package (that is, the bin file). The first step of installation is to extract the package. However, no PII can be accessed until the extraction work is completed because all the files are compressed in the package including PII files. Users won't have translation for extracting the status-related messages. Here, we suggest two alternatives:

  • Create an extra folder to locate the required translation files for the extraction process.
  • Ask users to extract manually using UNIX or Linux commands. The extraction-related messages will be handled by the operation systems.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=957716
ArticleTitle=String externalization practices and considerations for UNIX shell scripts
publish-date=12242013