File name characters

The following information lists the characters that should not or cannot be used for file, directory, or extended attribute names in IBM Spectrum Archive Single Drive Edition. Differences in portability when you assign names between Linux and platforms such as Windows are also discussed.

IBM Spectrum® Archive SDE supports valid characters in W3C Extensible Markup Language (XML) 1.0 standard, except for the / (slash), for file and directory names. IBM Spectrum Archive SDE version 2.4.0 and subsequent releases support more special characters in file names and in extended attributes than in previous versions by using the percent encoding method. The additional special characters now supported by percent encoding include:
  • U+0001 to U+001F (excluding U+0009 (TAB), U+000A (LF), and U+000D (CR), which were supported in previous releases, and continue to be supported without percent encoding).
  • The : (colon).

The percent encoding method is also used when there are special characters along with multi-byte characters in the name. The percent encoded logic is applied to the entire string of special and multi-byte characters.

On Windows systems, files and directory names cannot be created with a colon (:). But if a file or directory name is created with a colon on a Linux or Mac operating system, then moved to a Windows system, percent encoding is used to include the colon in the name in the index.

Note: On a Windows system, the : (colon) character appears as %3A in the index with percent encoding. The colon appears as an _ (underscore) in the file name on Windows Explorer, or in the command prompt.
The following characters are not supported for file or directory names on either the Linux or Windows systems:
  • Surrogate blocks.
  • BOM.
  • NULL character (U+0000).
  • Characters that cannot be converted to UTF-16, UTF-8 or NFC normalization.
The Windows system also does not support the following characters in file or directory names. To maintain portability between multiple platforms, these special characters should not be used with IBM Spectrum Archive SDE on a Linux or Mac system.
* ? < > " | \ 
Reserved device names
The following device names are reserved by Windows and should not be used as file names:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
Note: On Linux and OS X operating systems, reserved device names can be used as file names, which means that an LTFS-formatted medium that is used on Linux and OS X can have a Windows-reserved file name.
File names that are created on Linux and OS X are translated when a user inserts a medium on Windows. For example, if the user creates a file (CON) on Linux and OS X, and then inserts the medium on Windows, the user now views it as CON~1.
Note: If the user copies CON~1 to another folder on Windows, the destination file is now named CON~1. The folder does not store the original name (CON). If the user subsequently inserts the medium on Linux and OS X, the copied file displays as CON~1.
File/directory name case sensitivity
LTFS Windows currently supports only a case-insensitive file and directory naming convention. LTFS, however, stores the case. For example, when the user creates AbC.txt, the file is stored as AbC.txt on the medium.
LTFS Linux and OS X support a case-sensitive file system. An LTFS-formatted medium on Linux and OS X can have files/directories with only case-sensitive differences. For example, ABC.txt and abc.txt can both be on the same directory. However, if the user then inserts the medium on Windows, the file or directory names are translated. For example, if ABC.txt and abc.txt are created on Linux and the medium is inserted on Windows, the files are viewed as ABC.txt and abc~1.txt, respectively.
Note: If the user copies abc~1.txt to another folder on Windows, the destination file stores abc~1.txt, not the original name (abc.txt). The copied file displays as abc~1.txt even if the user reinserts the medium on Linux and OS X.