Using the File List node in text mining
The File List node is used when the text data resides in external unstructured documents in formats such as Microsoft Word, Microsoft Excel, and Microsoft PowerPoint, as well as Adobe PDF, XML, HTML, and others.
As an example, suppose we connected a File List node to a Text Mining node in order to supply text that resides in external documents:
- File List node (Settings tab). First, we added this node to the stream to specify where the text documents are stored. We selected the directory containing all of the documents on which we want to perform text mining.
- Text Mining node (Fields tab). Next, we added and connected a Text Mining node to the File List node. In this node, we defined our input format, resource template, and output format. We selected the field name produced from the File List node, the text field, and other settings. See the topic Using the Text Mining node in a stream for more information.
For more information on using the Text Mining node, see Text Mining modeling node.