Question & Answer
Question
How can we crawl RSS Feeds using the Web Crawler of IBM Content Analytics with Enterprise Search?
Cause
The web crawler does "link based" crawling on the pages linked from the feed and crawls entire reachable pages.
Pages of type RSS, RDF and ATOM are parsed by the crawler and links are extracted by special XML parsing rules.
Answer
Use the web crawler with a start URL which links to the feed.
If you do not want to crawl pages which are not listed in the feed use the feed itself as the start URL.
Was this topic helpful?
Document Information
More support for:
Content Analytics with Enterprise Search
Software version:
3.0
Operating system(s):
AIX, Linux, Windows
Document number:
230901
Modified date:
17 June 2018
UID
swg21647117