Product and system architecture overview for Watson Explorer oneWEX on IBM Cloud Private

Product architecture overview

IBM Watson® Explorer oneWEX components collect data from your enterprise; parse, analyze, and extract meaning from information; and create a text index that users can query.

A collection represents the set of sources that users can search and mine with a single query. When you create a collection, you specify which sources you want to include and configure options for how users can query the indexed data.

Creating and administering a collection involves the following activities.

Data Ingestion
The crawler components collect documents from data sources, either on a continual basis or according to a schedule that you specify. Frequent crawling ensures that users always have access to the latest information. In addition to crawling data sources, you can add content to a collection by importing CSV files. Moreover, converters are used to convert data retrieved by crawlers into data that is suitable for indexing.
Data Enrichment
The analytics pipeline extracts text from documents, does linguistic analysis, finds meaningful word and phrases, extracts entities, and performs custom analysis on each document. The detailed content analysis provides facets of data that can be used for exploring the content. Indexing data.
Machine Learning
Machine learning utilizes the analytics pipeline to learn the features in text to realize document classification or the search relevancy tuning with a ranker. Various out-of-the-box analysis or custom analysis allows the flexible feature engineering.
Exploration
The index components add data from new and changed documents to the index. The content analytics miner provides an interactive graphical interface for exploring analyzed content to discover relationships and anomalies.

System architecture overview

Watson™ Explorer oneWEX uses cloud-based resource management based on container technologies. Activities are mapped to Kubernetes objects as shown here.

WKS ML Service (version 12.0.2.2 and later)
The WKS ML Service controller manages WKS ML models and provides runtime to extract annotations with the model's Kubernetes controller name.
Kubernetes controller name
WKSML
Minimum number of pods
1
Persistence volume name
Not applicable
Persistence volume content
Not applicable
Gateway (version 12.0.2 and later)
The Gateway controller manages web application servers (Content Miner, Admin Console and Application Builder) and REST API server.
Kubernetes controller name
Gateway
Minimum number of pods
1 (Default value is 2.)
Persistence volume name
wex-data (shared)
Orchestrator (version 12.0.2 and later)
The Orchestrator controller manages background process schedulers (e.g. Enrichment task manager, Exporter process).
Kubernetes controller name
Orchestrator
Minimum number of pods
1 (Maximum value is 1.)
Persistence volume name
wex-data (shared)
NLP (version 12.0.2 and later)
The NLP controller manages Realtime the NLP API server.
Kubernetes controller name
NLP
Minimum number of pods
1
Persistence volume name
wex-data (shared)
Database (version 12.0.2 and later)
The Database controller stores user information, logs, and configuration.
Kubernetes controller name
Database
Minimum number of pods
1 (Maximum value is 1.)
Persistence volume name
wex-data (shared)
Management and Application (version 12.0.1 and earlier)
Description
The Management controller works as an endpoint and a hub. When a user requests a task such as starting a crawler or starting machine learning, the Management controller dispatches the request to the appropriate controller and monitors the status of submitted tasks. In addition, Management controls the applications such as Content Miner, administration applications, and the REST API endpoint.
Kubernetes controller name
Management
Minimum number of pods
1 (This value cannot be increased.)
Persistence volume name
wex-data
Persistence volume content
User information, configuration data for pod communication, logs.
Ingestion
Description
The Ingestion controller controls the data ingestion.
Kubernetes controller name
Ingestion
Minimum number of pods
1
Persistence volume name
Not applicable.
Persistence volume content
Not applicable.
Enrichment and Machine Learning
Description
The HDP controller runs data processing applications using distributed computing infrastructure. Enrichment and machine learning are done on this controller. To scale up, increase the number of worker pods, which is the HDP-worker pod.
Kubernetes controller name
HDP
Minimum number of pods
4 (2 worker, 1 name node, 1 resource node)
Persistence volume name
wex-hdp-pn, wex-hdp-nn, wex-hdp-log
Persistence volume content
Datasets, resources for enrichment and machine learning.
Exploration
Description
The Discovery controller manages the text index for search and content mining. Increasing the number of pods affects both index and search performance.
Important: Once you increase the number of pods for the Discovery controller, you cannot scale down the number of pods for this controller.
Kubernetes controller name
Discovery
Minimum number of pods
1
Persistence volume name
wex-index
Persistence volume content
Index data.
Configuration
Description
The Config controller manages the configuration of system resources such as dataset, collection or crawlers. To scale up, increase the number of pods for the Config controller. The number of pods should be 3 or 5.
Kubernetes controller name
Config
Minimum number of pods
3
Persistence volume name
wex-config
Persistence volume content
Collection and file resource information.

IBM Cloud Private management console provides a graphical user interface to manage and monitor your Kubernetes objects.