Product and system architecture overview for Watson Explorer oneWEX on IBM Cloud Private

Product architecture overview

IBM Watson® Explorer oneWEX components collect data from your enterprise; parse, analyze, and extract meaning from information; and create a text index that users can query.

A collection represents the set of sources that users can search and mine with a single query. When you create a collection, you specify which sources you want to include and configure options for how users can query the indexed data.

Creating and administering a collection involves the following activities.

Data Ingestion: The crawler components collect documents from data sources, either on a continual basis or according to a schedule that you specify. Frequent crawling ensures that users always have access to the latest information. In addition to crawling data sources, you can add content to a collection by importing CSV files. Moreover, converters are used to convert data retrieved by crawlers into data that is suitable for indexing.
Data Enrichment: The analytics pipeline extracts text from documents, does linguistic analysis, finds meaningful word and phrases, extracts entities, and performs custom analysis on each document. The detailed content analysis provides facets of data that can be used for exploring the content. Indexing data.
Machine Learning: Machine learning utilizes the analytics pipeline to learn the features in text to realize document classification or the search relevancy tuning with a ranker. Various out-of-the-box analysis or custom analysis allows the flexible feature engineering.
Exploration: The index components add data from new and changed documents to the index. The content analytics miner provides an interactive graphical interface for exploring analyzed content to discover relationships and anomalies.

System architecture overview

Watson™ Explorer oneWEX uses cloud-based resource management based on container technologies. Activities are mapped to Kubernetes objects as shown here.

WKS ML Service (version 12.0.2.2 and later)

The WKS ML Service controller manages WKS ML models and provides runtime to extract annotations with the model's Kubernetes controller name.

Kubernetes controller name: WKSML
Minimum number of pods: 1
Persistence volume name: Not applicable
Persistence volume content: Not applicable

Gateway (version 12.0.2 and later)

The Gateway controller manages web application servers (Content Miner, Admin Console and Application Builder) and REST API server.

Kubernetes controller name: Gateway
Minimum number of pods: 1 (Default value is 2.)
Persistence volume name: wex-data (shared)

Orchestrator (version 12.0.2 and later)

The Orchestrator controller manages background process schedulers (e.g. Enrichment task manager, Exporter process).

Kubernetes controller name: Orchestrator
Minimum number of pods: 1 (Maximum value is 1.)
Persistence volume name: wex-data (shared)

NLP (version 12.0.2 and later)

The NLP controller manages Realtime the NLP API server.

Kubernetes controller name: NLP
Minimum number of pods: 1
Persistence volume name: wex-data (shared)

Database (version 12.0.2 and later)

The Database controller stores user information, logs, and configuration.

Kubernetes controller name: Database
Minimum number of pods: 1 (Maximum value is 1.)
Persistence volume name: wex-data (shared)

Management and Application (version 12.0.1 and earlier)

Description: The Management controller works as an endpoint and a hub. When a user requests a task such as starting a crawler or starting machine learning, the Management controller dispatches the request to the appropriate controller and monitors the status of submitted tasks. In addition, Management controls the applications such as Content Miner, administration applications, and the REST API endpoint.
Kubernetes controller name: Management
Minimum number of pods: 1 (This value cannot be increased.)
Persistence volume name: wex-data
Persistence volume content: User information, configuration data for pod communication, logs.

Ingestion

Description: The Ingestion controller controls the data ingestion.
Kubernetes controller name: Ingestion
Minimum number of pods: 1
Persistence volume name: Not applicable.
Persistence volume content: Not applicable.

Enrichment and Machine Learning

Description: The HDP controller runs data processing applications using distributed computing infrastructure. Enrichment and machine learning are done on this controller. To scale up, increase the number of worker pods, which is the HDP-worker pod.
Kubernetes controller name: HDP
Minimum number of pods: 4 (2 worker, 1 name node, 1 resource node)
Persistence volume name: wex-hdp-pn, wex-hdp-nn, wex-hdp-log
Persistence volume content: Datasets, resources for enrichment and machine learning.

Exploration

Description: The Discovery controller manages the text index for search and content mining. Increasing the number of pods affects both index and search performance.
Important: Once you increase the number of pods for the Discovery controller, you cannot scale down the number of pods for this controller.
Kubernetes controller name: Discovery
Minimum number of pods: 1
Persistence volume name: wex-index
Persistence volume content: Index data.

Configuration

Description: The Config controller manages the configuration of system resources such as dataset, collection or crawlers. To scale up, increase the number of pods for the Config controller. The number of pods should be 3 or 5.
Kubernetes controller name: Config
Minimum number of pods: 3
Persistence volume name: wex-config
Persistence volume content: Collection and file resource information.

IBM Cloud Private management console provides a graphical user interface to manage and monitor your Kubernetes objects.