Code page conversion in ODWEK

ODWEK internally runs in UTF-8, which leads to a conversion of all index and annotation data from UTF-16 (the format in which the data are sent in TCP/IP) to UTF-8. Because ODWEK is a mid-tier system, you must consider the technology used to implement the presentation layer, whether that technology can correctly display the data, and, if necessary, convert the code page of the data.

In contrast to the standard Content Manager OnDemand client, ODWEK behaves differently when it converts code pages. Because ODWEK is a mid-tier system, there is always an additional presentation layer. In most cases, the layer is a browser; however, it can also be a stand-alone Java™ application that uses the ODWEK API.

When you implement a Java application that uses ODWEK, ensure that you correctly handle the information that you pass to and retrieve from ODWEK API. Because Java works in UTF-16 Unicode internally, you can ignore data that are returned from ODWEK API functions. Java handles the conversion from UTF-8. When you pass strings to ODWEK methods, you do not need to perform any additional tasks because Java supports only string variables that are in UTF-16. The conversion to UTF-8 is done by ODWEK subroutines.

Despite the internal conversion to UTF-8, ODWEK does not do any other conversion on indexes. Therefore, a client that displays index data that is received through the ODWEK API must correctly handle UTF-8 Unicode data. If you deliver index data to any external applications by using your ODWEK-based Java application, you must ensure that those applications can handle Unicode data or you must convert the data manually. The same implications apply if you want to save any indexes or annotation data to file. If you do not perform any explicit conversion, the data is written as a Unicode data stream. As web browsers usually can display and send UTF-8 data, it is not a problem when you implement web applications.

For the document data, you can handle the conversion in different ways. If you request raw native document data, ODWEK returns the data in its unaltered form: In the same code page in which it was archived. When you request the line data to be displayed by using an applet, ODWEK sends the UTF-8 ASCII data to the applet, but only standard HTML code containing applet invocation code is returned. When you request an ASCII conversion, ODWEK returns a UTF-8 ASCII converted representation of the original AFP or line data document. For most other document types, ODWEK works the same as the Content Manager OnDemand Windows client: it passes the data in the format that is native to the data format.