Web Feed node
The Web Feed node can be used to prepare text data from Web feeds for the text mining process. This node accepts Web feeds in two formats:
- RSS Format. RSS is a simple XML-based standardized format for Web content. The URL for this format points to a page that has a set of linked articles such as syndicated news sources and blogs. Since RSS is a standardized format, each linked article is automatically identified and treated as a separate record in the resulting data stream. No further input is required for you to be able to identify the important text data and the records from the feed unless you want to apply a filtering technique to the text.
- HTML Format. You can define one or more URLs to HTML pages on the Input tab. Then, in the Records tab, define the record start tag as well as identify the tags that delimit the target content and assign those tags to the output fields of your choice (description, title, modified date, and so on). See the topic Web Feed Node: Records Tab for more information.
Important! If you
are trying to retrieve information over the web through a proxy server,
you must enable the proxy server in the net.properties
file
for both the IBM® SPSS® Modeler Text Analytics Client
and Server. Follow the instructions detailed inside this file. This
applies when accessing the web through the Web Feed node or retrieving
an SDL Software as a Service (SaaS) license since these connections
go through Java™.
This file is located in C:\Program Files\IBM\SPSS\Modeler\18.6.0\jre\lib\net.properties by
default.
The output of this node is a set of fields used to describe the records. The Description field is most commonly used since it contains the bulk of the text content. However, you may also be interested in other fields, such as the short description of a record (Short Desc field) or the record's title (Title field). Any of the output fields can be selected as input for a subsequent Text Mining node.
You can find this node on the IBM SPSS Modeler Text Analytics tab of nodes palette at the bottom of the IBM SPSS Modeler window. See the topic IBM SPSS Modeler Text Analytics nodes for more information.