Using the Web Feed Node in Text Mining
The Web Feed node can be used to prepare text data from Internet Web feeds for the text mining process. This node accepts Web feeds in either an HTML or RSS format. These feeds serve as input into the text mining process (a subsequent Text Mining or Text Link Analysis node).
If you use the Web Feed node, you must make sure to specify that the Text field represents actual text in the Text Mining or Text Link Analysis node to indicate that these feeds link directly to each article or blog entry.
Important! If you
are trying to retrieve information over the web through a proxy server,
you must enable the proxy server in the net.properties
file
for both the IBM® SPSS® Modeler Text Analytics Client
and Server. Follow the instructions detailed inside this file. This
applies when accessing the web through the Web Feed node or retrieving
an SDL Software as a Service (SaaS) license since these connections
go through Java™.
This file is located in C:\Program Files\IBM\SPSS\Modeler\18.5.0\jre\lib\net.properties by
default.
Example: Web Feed node (RSS Feed) with the Text Mining modeling node
As an example, suppose we connect a Web Feed node to a Text Mining node in order to supply text data from an RSS feed into the text mining process.
- Web Feed node (Input tab). First, we added this node to the stream to specify where the feed contents are located and to verify the content structure. On the first tab, we provided the URL to an RSS feed. Since our example is for an RSS feed, the formatting is already defined, and we do not need to make any changes on the Records tab. An optional content filtering algorithm is available for RSS feeds, however in this case it was not applied.
- Text Mining node (Fields tab). Next, we added and connected a Text Mining node to the Web Feed node. On this tab, we defined the text field output by the Web Feed node. In this case, we wanted to use the Description field. We also selected the option Text field represents actual text, as well as other settings.
- Text Mining node (Model tab). Next, on the Model tab, we chose the build mode and resources. In this example, we chose to build a concept model directly from this node using the default resource template.
For more information on using the Text Mining node, see Text Mining modeling node.