File List node: Settings tab

On this tab you define the directories, file extensions, and input for this node.

Note: Text mining extraction cannot process Microsoft Office and Adobe PDF files under non-Microsoft Windows platforms. However, XML, HTML or text files can always be processed.

Any directory names and file names containing characters that are not included in the machine local encoding are not supported. When attempting to execute a stream containing a File List node, any file or directory names containing these characters will cause the stream execution to fail. This can happen with foreign language directory names or file names, such as a German filename on a French locale.

Directory Specifies the root folder containing the documents that you want to list.

File type(s) to include in list: You can select or deselect the file types and extensions you want to use. By deselecting a file extension, the files with that extension are ignored. You can filter by the following extensions:

Table 1. File type filters by file extension
  • .rtf, .doc, .docx, .docm
  • .xls, .xlsx, .xlsm
  • .ppt, .pptx, .pptm
  • .txt, .text
  • .htm, .html, .shtml
  • .xml
  • .pdf
  • .$
Note: For more information, see File List node.

If you have files with either no extension, or a trailing dot extension (for example File01 or File01.), use the No extension option to select these.

Input encoding If the output field will contain exact text, choose the relevant value from the following list:
  • Automatic (European)
  • UTF-8
  • UTF-16
  • ISO-8859-1
  • ISO-8859-2
  • Windows-1250
  • US ascii

The output is shown as UTF-8 document text.

Important: Since version 14, the List of directories option is no longer available and the only output is a list of files.