|
-A-
|
|
|
accent
|
A modifying mark on a character. For example, the accent marks in Latin script (acute, tilde, and ogonek) and the tone marks in Thai. Synonymous with diacritic.
|
|
alphabetic language
|
A written language in which symbols represent vowels and consonants, and in which syllables
and words are formed by a phonetic combination of symbols. Examples of alphabetic
languages are English, Greek, and Russian. Contrast with ideographic language.
|
|
Arabic numerals
|
The characters 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. Contrast with
Chinese numerals, Hindi numerals, and Roman numerals.
|
|
Arabic script
|
A cursive script used in Arabic countries. Other
writing systems such as Latin and Japanese also
have a cursive handwritten form, but usually are typeset or printed in discrete letter form.
Arabic script has only the cursive form, and is also used for Urdu, (which is spoken in
Pakistan, Bangladesh, and India), Farsi or Persian (which is spoken in Iran, Iraq, and Afghanistan).
|
|
ASCII
|
"American Standard Code for Information Interchange." A standard 7-bit character set
used for information interchange. ASCII encodes the basic Latin alphabet and
punctuation used in American English, but does not encode the
accented characters used in many European languages.
|
|
-B-
|
|
|
baseline |
A conceptual line with respect to which successive characters are aligned.
|
|
bidirectional
|
Languages such as Arabic, Hebrew, and Yiddish
whose general flow of text proceeds horizontally
from right to left, but numbers, English, and
other left-to-right language text are written from left to right.
|
|
-C-
|
|
|
character set
|
A collection of characters in which a numeric code is
assigned to each character so that it can be
represented on a computer. Most traditional character sets contain characters from only one or two scripts.
|
|
Chinese numerals
|
Chinese characters that represent numbers. For
example, the Chinese characters for 1, 2, and 3
are written with one, two, and three horizontal brush strokes, respectively. Contrast with
Arabic numerals, Hindi numerals, and Roman numerals.
|
|
code page
|
A synonym for character set.
|
|
collation
|
Text comparison using language-sensitive rules as
opposed to bitwise comparison of numeric character codes.
|
|
cursive script |
A script whose adjacent characters touch or
are connected to each other. For example, Arabic script is cursive.
|
|
-D-
|
|
|
diacritic
|
A modifying mark on a character. For example,
the accent marks in Latin script (acute, tilde, and ogonek) and the tone marks in Thai.
Synonymous with accent.
|
|
double-byte character set (DBCS)
|
A set of characters in which each character is represented by 2 bytes.
Scripts such as Japanese, Chinese, and Korean contain more characters than
can be represented by 256 code points, thus requiring two bytes to uniquely
represent each character. The term DBCS is often used to mean MBCS (multibyte
character set). See multibyte character set.
|
|
-E-
|
|
|
EBCDIC
|
Extended Binary-Coded Decimal Interchange Code. A group of coded
character sets that consists of eight-bit coded characters. EBCDIC-coded character sets map specified graphic and control characters onto code points, each
consisting of 8 bits. EBCDIC is an extension of BCD (Binary-Coded Decimal), which uses only 7 bits for each character.
|
|
ECMA
|
European Computer Manufacturers Association. A nonprofit organization formed by European
computer vendors to announce standards applicable to the functional design and use of data processing equipment.
|
|
encoding scheme
|
A set of specific definitions that describe the
philosophy used to represent character data.
Examples of specifications in such a
definition are: the number of bits, the number of bytes, the
allowable ranges of bytes, maximum number of
characters, and meanings assigned to some
generic and specific bit patterns.
|
|
-F-
|
|
|
font
|
A set of graphic characters that have a
characteristic design, or a font designer's
concept of how the graphic characters should
appear. The characteristic design specifies the
characteristics of its graphic characters.
Examples of characteristics are shape, graphic
pattern, style, size, weight, and increment.
|
|
-G-
|
|
|
globalization
|
The process of developing, manufacturing, and
marketing software products that are intended
for worldwide distribution. This term combines
two aspects of the work: internationalization
(enabling the product to be used without
language or culture barriers) and localization
(translating and enabling the product for a
specific locale).
|
|
glyph
|
The actual shape (bit pattern, outline) of a
character image. For example, an italic "A" and
a roman "A" are two different glyphs
representing the same underlying character.
Strictly speaking, any two images that differ in
shape constitute different glyphs. In this usage,
glyph is a synonym for character image, or
simply image.
|
|
graphic character
|
A character, other than a control function, that
has a visual representation normally
handwritten, printed, or displayed.
|
|
GMT
|
Greenwich mean time. In the 1840s the
standard time kept by the Royal Greenwich
Observatory located at Greenwich, England was
established for all of England, Scotland, and
Wales, replacing many local times in use in
those days. Subsequently GMT became the
official time reference for the world until 1972
when it was subsumed by the atomic clock-based
coordinated universal time (UTC). GMT is also
known as universal time.
|
|
-H-
|
|
|
Hangul
|
The Korean alphabet that consists of fourteen
consonants and ten vowels. Hangul was
created by a team of scholars in the 15th
century at the behest of King Sejong. See jamo.
|
|
Hanja
|
The Korean term for characters derived from Chinese.
|
|
Hiragana
|
A Japanese phonetic syllabary. The symbols are cursive or curvilinear in style. See Kanji and Katakana.
|
|
-I-
|
|
|
i18n
|
Synonym for internationalization. (There are 18 letters between the "i" and the "n" in
internationalization.)
|
|
ideographic language
|
A written language in which each character (ideogram) represents a thing or an idea (but
not necessarily a particular word or phrase). An
example of such a language is written Chinese
(Zhongen). Contrast with alphabetic language.
|
|
Indic numerals
|
A set of numerals used in India and many Arabic countries instead
of, or in addition to, the Arabic numerals. Indic numeral shapes
are ,
,
,
,
,
,
,
,
,
and ,
which correspond to the Arabic numeral shapes of 0, 1, 2, 3,
4, 5, 6, 7, 8, and 9, respectively. Contrast with Arabic
numerals, Chinese numerals, and Roman numerals. See
numbers.
|
|
internationalization
|
The process of producing an application that can be
localized for a particular country without any changes
to the program code. Internationalized applications
store their text in external resources, and use
locale-sensitive utilities for formatting and collation.
|
|
-J-
|
|
|
jamo
|
A set of consonants and vowels used in Korean
Hangul. The word jamo is derived from ja, which
means consonant, and mo, which means vowel.
|
|
-K-
|
|
|
Kanji
|
Chinese characters or ideograms used in
Japanese writing. The characters may have
different meanings from their Chinese counterparts. See Hiragana and Katakana.
|
|
Katakana
|
A Japanese phonetic syllabary used primarily for
foreign names and place names and words of
foreign origin. The symbols are angular, while
those of Hiragana are cursive. Katakana is written left to right, or top to bottom. See Kanji.
|
|
-L-
|
|
|
l10n
|
Synonym for localization. (There are 10 letters between the "l" and the "n" in
localization.)
|
|
language
|
A set of characters, phonemes, conventions, and rules used for conveying information.
The aspects of a language are pragmatics,
semantics, syntax, phonology, and morphology.
|
|
legacy
|
An inherited
obligation. For example, a legacy database
might contain strategic data that must be
maintained for a long time after the database
has become technologically obsolete.
|
|
localization
|
The process of converting a program to run in a
particular locale or country, so that all text is displayed
in the native language, and native conventions are
used for sorting, formatting, etc.
|
|
lowercase
|
The small alphabetic characters, whether
accented or not, as distinguished from the
capital alphabetic characters. The concept of
case applies to alphabets such as Latin, Cyrillic,
and Greek, but not to Arabic, Hebrew, Thai,
Japanese, Chinese, Korean, and many other
scripts. Examples of lowercase letters are a, b,
and c. Contrast with uppercase.
|
|
-M-
|
|
|
MBCS
|
Multibyte Character Set. A set of characters in which each character is represented by 1 or
more bytes. Contrast with DBCS and SBCS.
|
|
multilingual
|
An application that can simultaneously display and
manipulate text in multiple languages. For example, a
word processor that allows Japanese and English in
the same document is multilingual.
|
|
-N-
|
|
|
NLS
|
National Language Support. The features of a
product that accommodate a specific region, its
language, script, local conventions, and culture.
See internationalization and localization.
|
|
normalization
|
The process of converting Unicode text into one of
several standardized forms in which precomposed
and combining characters are used consistently. See
Unicode Technical Report #15 for details.
|
|
numbers
|
Numbers express either quantity (cardinal) or order (ordinal). Many
cultures have different forms for cardinal and ordinal numbers. For example,
in French the cardinal number five is cinq, but the ordinal fifth is
cinquième or 5eme or 5e. Numbers are written with symbols that are usually
referred to as numerals. See Arabic numerals,
Chinese numerals,
Indic numerals, and Roman numerals.
|
|
-P-
|
|
|
pinyin
|
A system to phonetically render Chinese ideograms in a Latin alphabet.
|
|
-R-
|
|
|
roman_numerals
|
A system of writing numbers in which the
characters I, V, X, L, C, D, and M have the value of 1, 5, 10, 50,
100, 500, and 1000, respectively. Lesser numbers in prefix
positions indicate subtraction. For example MCMLXIV is 1964 in
decimal, because CM is 900, LX is 60, and IV is 4. Contrast with
Arabic numerals,
Chinese numerals, and Indic numerals.
|
|
-S-
|
|
|
SBCS (Single-byte character set)
|
A set of characters in which each character is represented by 1 byte.
|
|
script
|
A set of characters used to write a particular set of languages.
For example, the Latin (or Roman) script is used to write
English, French, Spanish, and most other European languages;
the Cyrillic script is used to write Russian and Serbian.
|
|
separator
|
The thousands separator (or digit grouping separator) is the
local symbol used to separate every third digit in large
numbers or lengthy decimal fractions. The decimal separator
is the local symbol used to indicate the decimal position in a number.
|
|
-T-
|
|
|
transcoding
|
Conversion of character data from one character set to another.
|
|
translation
|
The conversion of text from one human language to another.
When localizing an application, one of the largest tasks is the
translation of all text resources into the target language.
|
|
transliteration
|
Transformation of text from one script to another, usually based
on phonetic equivalencies. For example, Greek text might be
transliterated into the Latin script so that it can be pronounced
by English speakers.
|
|
-U-
|
|
|
unicode
|
A character set that encompasses all of the world's living
languages. Unicode is the basis of most modern software internationalization.
|