File List node: Settings tab

On this tab you define the directories, file extensions, and input for this node.

Note: Text mining extraction cannot process Microsoft Office and Adobe PDF files under non-Microsoft Windows platforms. However, XML, HTML or text files can always be processed.

Any directory names and file names containing characters that are not included in the machine local encoding are not supported. When attempting to execute a stream containing a File List node, any file or directory names containing these characters will cause the stream execution to fail. This can happen with foreign language directory names or file names, such as a German filename on a French locale.

Directory. Specifies the root folder containing the documents that you want to list.

  • Include subdirectories. Specifies that subdirectories should also be scanned.

File type(s) to include in list: You can select or deselect the file types and extensions you want to use. By deselecting a file extension, the files with that extension are ignored. You can filter by the following extensions:

Table 1. File type filters by file extension
  • .rtf, .doc, .docx, .docm
  • .xls, .xlsx, .xlsm
  • .ppt, .pptx, .pptm
  • .txt, .text
  • .htm, .html, .shtml
  • .xml
  • .pdf
  • .$
Note: For more information, see File List node.

If you have files with either no extension, or a trailing dot extension (for example File01 or File01.), use the No extension option to select them.

Only ouputs document pathnames. Select this option if the output field will contain one or more pathnames for the location(s) of where the documents reside.

Input encoding. If the output field will contain exact text, choose the relevant value from the following list:
  • Automatic (European)
  • UTF-8
  • UTF-16
  • ISO-8859-1
  • ISO-8859-2
  • Windows-1250
  • US ascii

The output is shown as UTF-8 document text.