Qshell's Pax Command and File Conversion

Troubleshooting

Problem

This document describes the CCSID/CODEPAGE encoding process with Qshell's pax command in regards to text and binary files.

Resolving The Problem

Because pax (portable archive interchange) is meant to exchange files across various platforms, when an archive is written to, the internal representation of that file must be in a compatible encoding. That encoding is CODEPAGE 819. On IBM® i5/OS®, you have the ability to indicate what encoding (CCSID/CODEPAGE) is used for a given Integrated File System file. So, when an i5/OS Integrated File System file is written to a pax archive, it will be converted from the encoding indicated by the Integrated File System file to 819 as it is written to the pax archive. When extracting from the pax archive, the files that are extracted into the i5/OS Integrated File System must have their encoding set. By default, QIBM_CCSID is used, which means that when the files are extracted, they are converted from 819 to the QIBM_CCSID encoding.

For text files, this works out quite well. If you have a text file that is encoded in EBCDIC (for example, 37), when that file is written to a pax archive, it is converted to 819. This pax archive can be brought to another system or platform and the file can be extracted from it. If you extract from this pax archive on another i5/OS machine and the QIBM_CCSID is 278, for example, it converts the data in the archive from 819 to 278 and you get a text file in CODEPAGE 278 (even though the original file started out in CODEPAGE 37).

So, what does that mean for binary files? Obviously, you do not want any translation to occur (going into a pax archive or coming out of a pax archive). The pax documentation indicates Archive files must be in CCSID 819 for portability with other platforms. But, more importantly for a binary file, no conversion should happen when writing to or extracting from a pax archive. To accomplish this, set the CCSID/CODEPAGE of the binary files to 819. So, for the first step (write to the pax archive), no conversion will be done because the Qshell pax code sees that the two encodings are the same. Hence, the binary data will be written as-is into the pax archive. For the second step, extracting from that archive, you do not want any conversion to happen. Hence, IBM recommends using the -C option on pax to indicate 819. When the file is extracted from the pax archive, no conversion is done because the target CCSID is 819 and Qshell pax knows that the internal representation of the file is already in 819.

So, the options for using pax with binary files are:

o	Ensure the CCSID is set to 819 when the binary file is created.
o	Change it to 819 right before it is written to the pax archive. Then, when extracting, use -C 819 on pax to preserve the binary data (no conversion) or set QIBM_CCSID to 819 (this option is probably not the best choice because other utilities attempt to use the QIBM_CCSID value).

If the files in the pax archive contain a mix of binary files and text files, and you want the encoding of the text files to match the CCSID of the target system, then you might have a bit of a problem using pax as your means of transporting a set of files. Because the CCSID is not kept in the pax archive, there is no way of telling which files are binary and which files are text; namely, the -C option on pax applies to all files in the archive when they are extracted.

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"6.1.0"}]

Historical Number

453045914

Was this topic helpful?

Document Information

Modified date:
18 December 2019

UID

nas8N1014323

Tips