Pango is an open-source framework for the layout and rendering of
internationalized text, including right-to-left scripts and scripts
such as Tamil where glyphs are context-sensitive. Not surprisingly,
Pango uses Unicode characters internally (represented using
UTF-8), and Pango's interfaces also use UTF-8. Other encodings can be
supported by using a translation library such as GNU iconv to
convert the text to UTF-8 before processing.
Pango is designed as a modular, cross-platform, cross-toolkit, low-level library that can be used in multiple contexts. It is also intimately related to the GTK+ and GNOME projects; the Pango project started because of the need for high-quality internationalized text in GTK+ and GNOME. While Pango can be used separately, the current Pango (0.13) is being included in the development branch 1.3.x versions of GTK+ that are currently under heavy development; Pango will ultimately be incorporated into GTK+ 2.0.
The name "Pango" comes from the Greek "Pan" (Παν), meaning "All," and the Japanese "Go" (èª), meaning "Language."
Strings in Pango's interfaces are UTF-8 because of its compatibility with existing 8-bit software, for its pervasiveness on UNIX platforms, the fact that it does not require extra effort to handle characters outside Plane 0, and for its independence from byte-order concerns.
Offets into UTF-8 strings are counted in bytes, not characters. The Pango documentation acknowledges that UTF-8's variable length makes it harder to count characters in a string, but the documentation also notes that, in Unicode, any non-spacing marks in the string break any correspondence between character positions and strings, even for fixed-width encodings.
The Pango documentation also acknowledges that UTF-8 has a 50% overhead for CJKV ideographs compared to UTF-16.
Single characters are represented with 32 bits for planned upward compatibility with any characters to be defined in ISO/IEC 10646. While the ISO working group has recently committed to using only the same million or so code points covered by UTF-16, even that reduced range requires 21 bits, and 32 bits is still the next highest standardized word size.
Pango uses Dov Grobgeld's FriBidi implementation of the Unicode
bidirectional algorithm (see Resources). When
Pango is compiled with the --with-fribidi option, it will
use a copy of FriBidi that you provide; otherwise the copy in the
Pango source is used. The minimal version included with Pango 1.3 is
an older version that supports Unicode 2.1.8, whereas the latest
FriBidi version as of this writing supports Unicode 3.0.1.
In addition to handling right-to-left text, Pango supports language
tagging, so, for example, it will attempt to use a Japanese font for
text marked as Japanese. Language tagging, like all Pango text
attribute tagging, is a Pango-specific scheme. Language tagging
does not use Unicode's Plane 14 language tags, nor does it relate to
the xml:lang and html:lang attributes defined by the
W3C, but those and other language markup schemes could easily be
translated into Pango language attributes.
The complete set of Pango text attributes is shown in the following list:
- Language
- Font family: name of a font family or a comma-separated list of families
- Style: normal, oblique, or italic
- Weight: six possible values from ultralight to heavy
- Variant: normal or small caps
- Stretch: nine possible values from ultracondensed to ultraexpanded
- Size: font size in thousandths of a point
- Font description: shorthand label for a particular font family, style, variant, weight, stretch, and size
- Foreground color
- Background color
- Underline: whether the text is underlined with a single, double, or low line
- Strikethrough: whether the text is struck through
- Rise: vertical displacement
- Shape: shape to impose on a glyph
- Scale
The following two figures show examples of Pango in action. Note the use of German, Greek, Hebrew, Japanese, and Arabic text in the first figure and the additional use of French, Korean, and Russian in the second. Labels and text boxes containing German and French are admittedly easy to achieve on most English or European computer systems, but it is much less common for a computer system to be able to handle those languages and the other languages shown in the figures in combination.
Styled, multilanguage, and bidirectional text

Multiple languages in widget labels

The different attributes for a sequence of characters, including the language, are maintained separately from the text as a list of structures, one structure for each span of each attribute type. Every structure indicates a single attribute class and the start and end of the character range to which the class applies. Particular attribute types extend this with additional information; for example, the color attributes also record the red, green, and blue components of the color to apply to the span.
You can create the separate attribute list for some text (for example, for a widget label), but it can be a painstaking task when there are a lot of attribute changes. Also, as the Pango documentation notes, the character ranges in each attribute structure will surely be invalid for any later translation of the original attributed text.
As a convenience measure for translators in particular, Pango
supports a simple HTML-like markup language for embedding attribute
changes in the text, and it provides the pango_parse_markup()
function for converting marked-up text into a plain string and a
separate attribute list. The root element is <markup>,
but it can be omitted. (You can omit both the start tag
and the end tag, but omitting just one causes an error.)
The most versatile element, and the one that will have the most
common use, is <span>. Like the HTML element with
the same name, this marks a span of text, and its start tag may have
the following attributes whose values will be translated into Pango
text attribute values:
font_desc: a shorthand font description, such as "Sans Italic 12" (any other span attributes override this description)font_family: A font family nameface: Synonym for thefont_familyattributesize: Font size in thousandths of a point; a predefined absolute size keyword such asxx-smallorxx-large, or one of the relative sizessmallerorlargerstyle: One ofnormal,oblique, oritalic, corresponding to the allowed values of the style text attributeweight: One of six keywords such asultralight,normal, orheavy-- or a numeric weightvariant:normalorsmallcapsstretch: One of nine keywords such asultracondensed,normal, andultraexpandedthat correspond to the allowed values of the stretch text attributeforeground: An RGB color specification such as#00FF00or a color name such asredbackground: An RGB color specification such as#00FF00or a color name such asredunderline: One ofsingle,double,low,nonerise: Vertical displacement, in ten thousandths of an em. Can be negative for subscript, positive for superscriptstrikethrough:trueorfalse, whether to strike through the textlang: A language code (for example,fr)
The markup language also includes a handful of convenience elements that do not have attributes:
<b>: bold<big>: equivalent to<span size="larger"><i>: italic<s>: strikethrough<sub>: subscript<sup>: superscript<small>: equivalent to<span size="smaller"><tt>: monospace font<u>: underline
The absolute and relative sizes of successive steps of
the size attribute and the size increase or decrease from the
<bigger> or <smaller> elements is in the ratio
1:1.2 (or 1.2:1); this is the same as the CSS scale factor between
its text sizes.
The markup language is case-sensitive, unlike HTML (but like
XML), and the only tags that can be omitted are the pair of the
<markup> start tag and end tag.
Pango implements formatting and rendering in a staged pipeline.
The following example adds markup to an example used in both Chapter 3 of The Unicode Standard, Version 3.0 and UAX #9 (see Resources). The uppercase text in the example stands for right-to-left text such as Arabic or Hebrew. The markup makes some of the text underlined, some of it blue, and some of it both underlined and blue.
<u>car </u><span foreground="blue"><u>is </u>THE CAR</span> in arabic |
The effect of the markup is shown in the following table.
<table border="1"> <tr> <td>String</td> <td><code>car </code></td> <td><code>is </code></td> <td><code>THE CAR</code></td> <td><code> in arabic</code></td> </tr> <tr> <td>Foreground</td> <td> </td> <td colspan="2" align="center"><span style="color: blue">Blue</span></td> <td> </td> </tr> <tr> <td>Underline</td> <td colspan="2" align="center"><u>True</u></td> <td> </td> <td> </td> </tr> </table>
The first step when laying out the text is to break the string into portions with consistent attributes, including consistent language tag, bidirectional category, color, etc.
Markup for the attributes is just a convenience feature, and the
pipeline really begins with text and a list of Pango attributes, so
step 0, as it were, is to call pango_parse_markup() with
the above example as input. This returns a single string containing
the text and a list of four Pango attributes -- one for each change in
the attributes. The table below shows the spans.
<table border="1"> <tr> <td>String</td> <td><code>car </code></td> <td><code>is </code></td> <td><code>THE CAR</code></td> <td><code> in arabic</code></td> </tr> <tr> <td>Foreground</td> <td> </td> <td align="center"><span style="color: blue">Blue</span></td> <td align="center"><span style="color: blue">Blue</span></td> <td> </td> </tr> <tr> <td>Underline</td> <td align="center"><u>True</u></td> <td align="center"><u>True</u></td> <td> </td> <td> </td> </tr> <tr> <td>Bidi Level</td> <td align="center">0</td> <td align="center">0</td> <td align="center">1</td> <td align="center">0</td> </tr> </table>
The items are then reordered into visual order, as the following table shows. Remember that for the purposes of this example, uppercase text stands for right-to-left text such as Arabic or Hebrew.
The "Bidi Level" in the table is the Unicode bidirectional embedding level of the spans, where even numbers (including 0) indicate left-to-right text and odd numbers indicate right-to-left text. Bidi level is not recorded in Pango attributes, but it is calculated by the FriBidi library.
<table border="1"> <tr> <td>String</td> <td><code>car </code></td> <td><code>is </code></td> <td><code>RAC EHT</code></td> <td><code> in arabic</code></td> </tr> <tr> <td>Foreground</td> <td> </td> <td align="center"><span style="color: blue">Blue</span></td> <td align="center"><span style="color: blue">Blue</span></td> <td> </td> </tr> <tr> <td>Underline</td> <td align="center"><u>True</u></td> <td align="center"><u>True</u></td> <td> </td> <td> </td> </tr> <tr> <td>Bidi Level</td> <td align="center">0</td> <td align="center">0</td> <td align="center">1</td> <td align="center">0</td> </tr> </table>
Pango then selects the appropriate glyphs for the characters in each item.
Pango supports script-specific layout engines so, for example, Tamil glyph selection is done by the Tamil engine and Thai glyph selection is done by the Thai engine. There doesn't have to be one engine per script however, and, in practice, characters from the Basic Latin, Latin-1 Supplement, Greek, Cyrillic, and several other blocks are all handled by the "basic" engine.
The glyph strings are justified, for example, to the right or to the left as shown by some of the labels in the previous figure.
The glyphs are rendered onto an output device. Pango is not a rendering system, but it does include a rendering routine for X fonts. Other output devices will require other, external rendering routines. The following table shows how the example might look when rendered.
<table border="1"> <tr> <td><u>car <span style="color: blue">is </span></u><span style="color: blue">RAC EHT</span> in arabic</code></td> </tr> </table>
In the second installment, I'll show the code for the example and discuss how Pango selects glyphs and renders text.
- Pango information and downloads are available at http://www.pango.org/.
- Dov Grobgeld's FriBidi library is available at http://fribidi.sourceforge.net/.
- GNOME is included in most Linux distributions and is
also being adopted by several commercial UNIX vendors.
Information on GNOME is available at http://www.gnome.org.
- The first of a three-part series on GNOME programming
by George Lebl is available at http://www.ibm.com/developerworks/library/gnome-programming/.
- The Unicode Bidirectional Algorithm is defined in "UAX
#9, The Bidirectional Algorithm," available at http://www.unicode.org/unicode/reports/tr9/.
Tony Graham is the author of Unicode: A Primer , the first and currently only book about the Unicode Standard, Version 3.0, and its uses. An Australian, Tony is a Specialist member of the Unicode Consortium. He can be reached at tkg@menteith.com.





