Before you start
This tutorial is for anyone who wants to understand the basics of Unicode-based multilingual Web page development. It explains the concepts behind this process and lays the groundwork for future development.
Multilingual computing is key to the future worldwide growth of the Internet, and Unicode and XML are significant building blocks upon which it will be built.
This tutorial shows you some of the basic principles involved in constructing a multilingual Web page in Unicode.
The tutorial focuses on a sample survey question in three languages, which is merely an example of what will be possible in future multilingual computing.
This tutorial is introductory. No prerequisite knowledge or experience is needed to understand the concepts presented here.
No additional hardware is needed for this tutorial. However, to view some of the files online, Unicode fonts must be loaded. See the Resources section for information on Unicode fonts if you don't already have one loaded. (Note: Many Unicode fonts are still incomplete or in development, so one particular font may not contain every character; however, for the purposes of this tutorial, the Unicode font you download should ideally contain all or most of the language characters used in our examples. (If yours does not, try downloading an additional Unicode font.) It may be necessary to change the encoding in the browser to Unicode (UTF-8). In Internet Explorer, do this by going to View > Encoding > Unicode (UTF-8).
To produce your own multilingual Web page in Unicode, you'll need some type of multilingual word processing software. One of the very best resources for assessing current Unicode-based software is Alan Wood's Unicode Resources (see Resources). To be most efficient, one should use multilingual software that: (1) uses a Unicode font that includes all of the languages desired; (2) saves the multiple languages for the multilingual Web page as an HTML file with UTF-8 (Unicode) encoding; and (3) allows for easy editing of the HTML file containing the Unicode hexadecimal equivalents of all of the different foreign language characters. The hexadecimal equivalents can be entered by hand if one has the time to look up all the necessary Unicode references, but this is a very cumbersome procedure. The software used to produce the examples in this tutorial supports more than a hundred languages.
ASCII - Acronym for the "American Standard Code for Information Interchange." As an assignment standard for English-based characters, ASCII is structured on the Latin character set, upon which English is based.
CGI - Protocol that stands for "Common Gateway Interface." The CGI protocol in this case is the mechanism and definition for how browsers communicate with servers. Perl is a programming language; a "Perl CGI script" is a script written in the Perl language which adheres to and uses the CGI protocol.
Unicode - Universal encoding standard for (eventually) all the characters and symbols used in all the languages of the world, past and present. It has merged with and is compatible with international standard ISO/IEC 10646. Unicode is a registered trademark of Unicode, Inc., and is in development by the Unicode Consortium. The latest version is 3.0 (released in early 2000). While currently still in development at about 50,000 characters, the Unicode standard will eventually be capable of displaying more than a million characters, enough to cover all the living and ancient languages in the world and all required scientific and mathematical and other types of symbols. A Unicode font is usually a subset of this larger standard and contains "glyphs." Glyphs are representations of characters within a particular font. For example, the English letter "D" can also be printed as D. These are different glyphs for the same character.
XML - eXtensible Markup Language (XML) is merely a markup language like HTML which uses some specific conventions but then allows for nearly limitless creativity and flexibility in the use of self-identified tags. This ability is a tremendous organizing principle which is largely lacking in HTML. XML allows for the creation of sub-languages and complexity but also easy retrieval and manipulation of the data thus organized.

