Crawling RSS Feeds using Web Crawler of IBM Content Analytics with Enterprise Search

Question & Answer

Question

How can we crawl RSS Feeds using the Web Crawler of IBM Content Analytics with Enterprise Search?

Cause

The web crawler does "link based" crawling on the pages linked from the feed and crawls entire reachable pages.
Pages of type RSS, RDF and ATOM are parsed by the crawler and links are extracted by special XML parsing rules.

Answer

Use the web crawler with a start URL which links to the feed.

If you do not want to crawl pages which are not listed in the feed use the feed itself as the start URL.

[{"Product":{"code":"SS5RWK","label":"Content Analytics with Enterprise Search"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"3.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Was this topic helpful?

Document Information

More support for:
Content Analytics with Enterprise Search

Software version:
3.0

Operating system(s):
AIX, Linux, Windows

Document number:
230901

Modified date:
17 June 2018

UID

swg21647117

IBM Support

Tips