Letter case

Locale-specific case mappings

Certain scripts, such as Latin, Greek, and Cyrillic, have letter cases. Those mappings between lower case characters and upper case characters are locale sensitive.

👉 Note: When converting the case of characters in your application, it is highly recommended to use String.toLowerCase(locale) and String.toUpperCase(locale) methods specifying Locale arguments explicitly.

For example, English uses Latin script. It consists of 26 alphabets:

Upper case Lower case
A U+000041 LATIN CAPITAL LETTER A a U+000061 LATIN SMALL LETTER A
B U+000042 LATIN CAPITAL LETTER B b U+000062 LATIN SMALL LETTER B
C U+000043 LATIN CAPITAL LETTER C c U+000063 LATIN SMALL LETTER C
D U+000044 LATIN CAPITAL LETTER D d U+000064 LATIN SMALL LETTER D
E U+000045 LATIN CAPITAL LETTER E e U+000065 LATIN SMALL LETTER E
F U+000046 LATIN CAPITAL LETTER F f U+000066 LATIN SMALL LETTER F
G U+000047 LATIN CAPITAL LETTER G g U+000067 LATIN SMALL LETTER G
H U+000048 LATIN CAPITAL LETTER H h U+000068 LATIN SMALL LETTER H
I U+000049 LATIN CAPITAL LETTER I i U+000069 LATIN SMALL LETTER I
J U+00004A LATIN CAPITAL LETTER J j U+00006A LATIN SMALL LETTER J
K U+00004B LATIN CAPITAL LETTER K k U+00006B LATIN SMALL LETTER K
L U+00004C LATIN CAPITAL LETTER L l U+00006C LATIN SMALL LETTER L
M U+00004D LATIN CAPITAL LETTER M m U+00006D LATIN SMALL LETTER M
N U+00004E LATIN CAPITAL LETTER N n U+00006E LATIN SMALL LETTER N
O U+00004F LATIN CAPITAL LETTER O o U+00006F LATIN SMALL LETTER O
P U+000050 LATIN CAPITAL LETTER P p U+000070 LATIN SMALL LETTER P
Q U+000051 LATIN CAPITAL LETTER Q q U+000071 LATIN SMALL LETTER Q
R U+000052 LATIN CAPITAL LETTER R r U+000072 LATIN SMALL LETTER R
S U+000053 LATIN CAPITAL LETTER S s U+000073 LATIN SMALL LETTER S
T U+000054 LATIN CAPITAL LETTER T t U+000074 LATIN SMALL LETTER T
U U+000055 LATIN CAPITAL LETTER U u U+000075 LATIN SMALL LETTER U
V U+000056 LATIN CAPITAL LETTER V v U+000076 LATIN SMALL LETTER V
W U+000057 LATIN CAPITAL LETTER W w U+000077 LATIN SMALL LETTER W
X U+000058 LATIN CAPITAL LETTER X x U+000078 LATIN SMALL LETTER X
Y U+000059 LATIN CAPITAL LETTER Y y U+000079 LATIN SMALL LETTER Y
Z U+00005A LATIN CAPITAL LETTER Z z U+00007A LATIN SMALL LETTER Z

Turkish also uses Latin script. It consists of 29 alphabets:

Upper case Lower case
A U+000041 LATIN CAPITAL LETTER A a U+000061 LATIN SMALL LETTER A
B U+000042 LATIN CAPITAL LETTER B b U+000062 LATIN SMALL LETTER B
C U+000043 LATIN CAPITAL LETTER C c U+000063 LATIN SMALL LETTER C
Ç U+0000C7 LATIN CAPITAL LETTER C WITH CEDILLA ç U+0000E7 LATIN SMALL LETTER C WITH CEDILLA
D U+000044 LATIN CAPITAL LETTER D d U+000064 LATIN SMALL LETTER D
E U+000045 LATIN CAPITAL LETTER E e U+000065 LATIN SMALL LETTER E
F U+000046 LATIN CAPITAL LETTER F f U+000066 LATIN SMALL LETTER F
G U+000047 LATIN CAPITAL LETTER G g U+000067 LATIN SMALL LETTER G
Ğ U+00011E LATIN CAPITAL LETTER G WITH BREVE ğ U+00011F LATIN SMALL LETTER G WITH BREVE
H U+000048 LATIN CAPITAL LETTER H h U+000068 LATIN SMALL LETTER H
I U+000049 LATIN CAPITAL LETTER I ı U+000131 LATIN SMALL LETTER DOTLESS I
İ U+000130 LATIN CAPITAL LETTER I WITH DOT ABOVE i U+000069 LATIN SMALL LETTER I
J U+00004A LATIN CAPITAL LETTER J j U+00006A LATIN SMALL LETTER J
K U+00004B LATIN CAPITAL LETTER K k U+00006B LATIN SMALL LETTER K
L U+00004C LATIN CAPITAL LETTER L l U+00006C LATIN SMALL LETTER L
M U+00004D LATIN CAPITAL LETTER M m U+00006D LATIN SMALL LETTER M
N U+00004E LATIN CAPITAL LETTER N n U+00006E LATIN SMALL LETTER N
O U+00004F LATIN CAPITAL LETTER O o U+00006F LATIN SMALL LETTER O
Ö U+0000D6 LATIN CAPITAL LETTER O WITH DIAERESIS ö U+0000F6 LATIN SMALL LETTER O WITH DIAERESIS
P U+000050 LATIN CAPITAL LETTER P p U+000070 LATIN SMALL LETTER P
R U+000052 LATIN CAPITAL LETTER R r U+000072 LATIN SMALL LETTER R
S U+000053 LATIN CAPITAL LETTER S s U+000073 LATIN SMALL LETTER S
Ş U+00015E LATIN CAPITAL LETTER S WITH CEDILLA ş U+00015F LATIN SMALL LETTER S WITH CEDILLA
T U+000054 LATIN CAPITAL LETTER T t U+000074 LATIN SMALL LETTER T
U U+000055 LATIN CAPITAL LETTER U u U+000075 LATIN SMALL LETTER U
Ü U+0000DC LATIN CAPITAL LETTER U WITH DIAERESIS ü U+0000FC LATIN SMALL LETTER U WITH DIAERESIS
V U+000056 LATIN CAPITAL LETTER V v U+000076 LATIN SMALL LETTER V
Y U+000059 LATIN CAPITAL LETTER Y y U+000079 LATIN SMALL LETTER Y
Z U+00005A LATIN CAPITAL LETTER Z z U+00007A LATIN SMALL LETTER Z

As can be seen from these tables, both English and Turkish use many characters in common and most of the case mappings are the same. However there are several incompatibilities:

Language Upper case Lower case
English I (U+000049 LATIN CAPITAL LETTER I) i (U+000069 LATIN SMALL LETTER I)
Turkish I (U+000049 LATIN CAPITAL LETTER I) ı (U+000131 LATIN SMALL LETTER DOTLESS I)
İ (U+000130 LATIN CAPITAL LETTER I WITH DOT ABOVE) i (U+000069 LATIN SMALL LETTER I)

These special case mappings are defined by Unicode and published at SpecialCasing.txt. In several languages, even the string length can change by case conversion.