Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Introduction to Unicode

Jim Melnick (info@PortableExpert.com), Web designer and consultant, Internet Interactive Services,
Jim Melnick is president of Internet Interactive Services, which does multilingual Web design and consulting. Visit his Web site at www.PortableExpert.com.

Summary:  In this tutorial, learn the basics of Unicode-based multilingual Web page development. This tutorial introduces you to how you can integrate multilingual characters, the Unicode encoding standard, XML, and Perl CGI scripts to produce a truly multilingual Web page.

Date:  16 Feb 2001
Level:  Introductory PDF:  A4 and Letter (256 KB | 15 pages)Get Adobe® Reader®

Activity:  4422 views
Comments:  

Before you start

About this tutorial

This tutorial is for anyone who wants to understand the basics of Unicode-based multilingual Web page development. It explains the concepts behind this process and lays the groundwork for future development.

Multilingual computing is key to the future worldwide growth of the Internet, and Unicode and XML are significant building blocks upon which it will be built.

This tutorial shows you some of the basic principles involved in constructing a multilingual Web page in Unicode.

The tutorial focuses on a sample survey question in three languages, which is merely an example of what will be possible in future multilingual computing.

Prerequisite knowledge

This tutorial is introductory. No prerequisite knowledge or experience is needed to understand the concepts presented here.

System requirements

No additional hardware is needed for this tutorial. However, to view some of the files online, Unicode fonts must be loaded. See the Resources section for information on Unicode fonts if you don't already have one loaded. (Note: Many Unicode fonts are still incomplete or in development, so one particular font may not contain every character; however, for the purposes of this tutorial, the Unicode font you download should ideally contain all or most of the language characters used in our examples. (If yours does not, try downloading an additional Unicode font.) It may be necessary to change the encoding in the browser to Unicode (UTF-8). In Internet Explorer, do this by going to View > Encoding > Unicode (UTF-8).

To produce your own multilingual Web page in Unicode, you'll need some type of multilingual word processing software. One of the very best resources for assessing current Unicode-based software is Alan Wood's Unicode Resources (see Resources). To be most efficient, one should use multilingual software that: (1) uses a Unicode font that includes all of the languages desired; (2) saves the multiple languages for the multilingual Web page as an HTML file with UTF-8 (Unicode) encoding; and (3) allows for easy editing of the HTML file containing the Unicode hexadecimal equivalents of all of the different foreign language characters. The hexadecimal equivalents can be entered by hand if one has the time to look up all the necessary Unicode references, but this is a very cumbersome procedure. The software used to produce the examples in this tutorial supports more than a hundred languages.

Terms used in this tutorial

ASCII - Acronym for the "American Standard Code for Information Interchange." As an assignment standard for English-based characters, ASCII is structured on the Latin character set, upon which English is based.

CGI - Protocol that stands for "Common Gateway Interface." The CGI protocol in this case is the mechanism and definition for how browsers communicate with servers. Perl is a programming language; a "Perl CGI script" is a script written in the Perl language which adheres to and uses the CGI protocol.

Unicode - Universal encoding standard for (eventually) all the characters and symbols used in all the languages of the world, past and present. It has merged with and is compatible with international standard ISO/IEC 10646. Unicode is a registered trademark of Unicode, Inc., and is in development by the Unicode Consortium. The latest version is 3.0 (released in early 2000). While currently still in development at about 50,000 characters, the Unicode standard will eventually be capable of displaying more than a million characters, enough to cover all the living and ancient languages in the world and all required scientific and mathematical and other types of symbols. A Unicode font is usually a subset of this larger standard and contains "glyphs." Glyphs are representations of characters within a particular font. For example, the English letter "D" can also be printed as D. These are different glyphs for the same character.

XML - eXtensible Markup Language (XML) is merely a markup language like HTML which uses some specific conventions but then allows for nearly limitless creativity and flexibility in the use of self-identified tags. This ability is a tremendous organizing principle which is largely lacking in HTML. XML allows for the creation of sub-languages and complexity but also easy retrieval and manipulation of the data thus organized.

1 of 6 | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=136304
TutorialTitle=Introduction to Unicode
publish-date=02162001
author1-email=info@PortableExpert.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.