Character normalization is a process that can improve recall. Improving recall by character normalization means that more documents are retrieved even if the documents do not exactly match the query.
Watson Explorer Content Analytics uses Unicode compatibility normalization that includes the normalization of Asian half-width characters to full-width characters. Other forms of character normalization include:
All normalizations work both ways. You can find documents that contain usa when you search for USA, documents that contain words with e when you search for é, and so on. These normalizations can also be combined. For example, you can find documents that contain météo when you search for METEO.
The normalizations are based on Unicode character properties and are not language-dependent. For example, Watson Explorer Content Analytics supports diacritic removal for Hebrew and ligature expansion for Arabic.