BoardReader crawler - configuration properties

The BoardReader crawler crawls social media data that has been collected by the BoardReader web service. BoardReader is an application that aggregates data from multiple social media sources across the Internet.

Note: BoardReader crawler is deprecated and removed effective 26 July 2024.

In order to run the BoardReader crawler, you need a BoardReader API key. Contact BoardReader to obtain this key.

The Create crawler: BoardReader screen is where you enter the configuration parameters for this crawler.

Crawler Properties

Crawler name
The name of the crawler. Alphanumeric characters, hyphens, underscores, and spaces are allowed.
Crawler description
A description of the crawler.
Advanced options
Time to wait between retrieval requests
The time is expressed in milliseconds.
Maximum number of active crawler threads
The maximum number of active crawler threads.
Maximum document size
The maximum size expressed in kilobytes. The maximum value is 131,071 kilobytes.
When the crawler session is started
Specifies which content to crawl.

Data Source Properties

BoardReader license key
BoardReader license key to call BoardReader API.
Crawl Duration
Select the crawl duration.
Start date
The duration start date to crawl.
End date
The duration end date to crawl.
Duration period type
Select the crawl duration period type. This option is shown only when The current time for a specified duration is selected as Crawl Duration.
Duration period amount
The crawl duration period amount. This option is shown only when The current time for a specified duration is selected as Crawl Duration.
Domain Conditions
The domain list of social media to crawl.
Query Conditions
BoardReader queries to limit how much content is crawled. The crawler applies Boolean OR logic to combine multiple queries.
BoardReader API parameters
BoardReader API parameters. For example, filter_language=ja&filter_country=jp limits the crawl to documents in Japanese that originate in Japan.
Default time zone
The default time zone that is used to parse date string values to epoch time.
Time zone list
Specified time zone that is used to parse date string values which are crawled from the corresponding domain. For example, *fr.yahoo.com=WET.
Proxy server host name
The host name of the proxy server.
Proxy server port
The port of the proxy server.
User ID for the proxy server
The user name to access the proxy server.
Password for the proxy server
The password of the user to access the proxy server.

Crawl space Properties

You can find and add multiple crawl spaces for a BoardReader crawler. For instructions, see Finding and adding crawl spaces in a BoardReader crawler.

Crawler plug-in

Data source crawler plug-ins are Java™ applications that can change the content or metadata of crawled documents. You can configure a data source crawler plug-in for all non-web crawler types. For more information, see Crawler plug-ins.

Enable the crawler plug-in
Enable this option when you use the crawler plug-in.
Plug-in class name
The class name for the crawler plug-in.
Plug-in class path
The JAR file location of the crawler plug-in. The folder that contains the JAR file must be mounted so it is available. For more information, see Providing access to the local filesystem from Watson Explorer oneWEX.