Product and system architecture overview for Watson Explorer oneWEX on IBM Cloud Private
Product architecture overview
IBM Watson® Explorer oneWEX components collect data from your enterprise; parse, analyze, and extract meaning from information; and create a text index that users can query.
A collection represents the set of sources that users can search and mine with a single query. When you create a collection, you specify which sources you want to include and configure options for how users can query the indexed data.
Creating and administering a collection involves the following activities.
- Data Ingestion
- The crawler components collect documents from data sources, either on a continual basis or according to a schedule that you specify. Frequent crawling ensures that users always have access to the latest information. In addition to crawling data sources, you can add content to a collection by importing CSV files. Moreover, converters are used to convert data retrieved by crawlers into data that is suitable for indexing.
- Data Enrichment
- The analytics pipeline extracts text from documents, does linguistic analysis, finds meaningful word and phrases, extracts entities, and performs custom analysis on each document. The detailed content analysis provides facets of data that can be used for exploring the content. Indexing data.
- Machine Learning
- Machine learning utilizes the analytics pipeline to learn the features in text to realize document classification or the search relevancy tuning with a ranker. Various out-of-the-box analysis or custom analysis allows the flexible feature engineering.
- Exploration
- The index components add data from new and changed documents to the index. The content analytics miner provides an interactive graphical interface for exploring analyzed content to discover relationships and anomalies.
System architecture overview
Watson™ Explorer oneWEX uses cloud-based resource management based on container technologies. Activities are mapped to Kubernetes objects as shown here.
- WKS ML Service (version 12.0.2.2 and later)
- The WKS ML Service controller manages WKS ML models and provides runtime to extract annotations
with the model's Kubernetes controller name.
- Kubernetes controller name
- WKSML
- Minimum number of pods
- 1
- Persistence volume name
- Not applicable
- Persistence volume content
- Not applicable
- Gateway (version 12.0.2 and later)
- The Gateway controller manages web application servers (Content Miner, Admin Console and
Application Builder) and REST API server.
- Kubernetes controller name
- Gateway
- Minimum number of pods
- 1 (Default value is 2.)
- Persistence volume name
wex-data
(shared)
- Orchestrator (version 12.0.2 and later)
- The Orchestrator controller manages background process schedulers (e.g. Enrichment task manager,
Exporter process).
- Kubernetes controller name
- Orchestrator
- Minimum number of pods
- 1 (Maximum value is 1.)
- Persistence volume name
wex-data
(shared)
- NLP (version 12.0.2 and later)
- The NLP controller manages Realtime the NLP API server.
- Kubernetes controller name
- NLP
- Minimum number of pods
- 1
- Persistence volume name
wex-data
(shared)
- Database (version 12.0.2 and later)
- The Database controller stores user information, logs, and configuration.
- Kubernetes controller name
- Database
- Minimum number of pods
- 1 (Maximum value is 1.)
- Persistence volume name
wex-data
(shared)
- Management and Application (version 12.0.1 and earlier)
-
- Description
- The Management controller works as an endpoint and a hub. When a user requests a task such as starting a crawler or starting machine learning, the Management controller dispatches the request to the appropriate controller and monitors the status of submitted tasks. In addition, Management controls the applications such as Content Miner, administration applications, and the REST API endpoint.
- Kubernetes controller name
- Management
- Minimum number of pods
- 1 (This value cannot be increased.)
- Persistence volume name
wex-data
- Persistence volume content
- User information, configuration data for pod communication, logs.
- Ingestion
-
- Description
- The Ingestion controller controls the data ingestion.
- Kubernetes controller name
- Ingestion
- Minimum number of pods
- 1
- Persistence volume name
- Not applicable.
- Persistence volume content
- Not applicable.
- Enrichment and Machine Learning
-
- Description
- The HDP controller runs data processing applications using distributed computing infrastructure.
Enrichment and machine learning are done on this controller. To scale up, increase the number of
worker pods, which is the
HDP-worker
pod. - Kubernetes controller name
- HDP
- Minimum number of pods
- 4 (2 worker, 1 name node, 1 resource node)
- Persistence volume name
wex-hdp-pn
,wex-hdp-nn
,wex-hdp-log
- Persistence volume content
- Datasets, resources for enrichment and machine learning.
- Exploration
-
- Description
- The Discovery controller manages the text index for search and content mining. Increasing the
number of pods affects both index and search performance.Important: Once you increase the number of pods for the Discovery controller, you cannot scale down the number of pods for this controller.
- Kubernetes controller name
- Discovery
- Minimum number of pods
- 1
- Persistence volume name
wex-index
- Persistence volume content
- Index data.
- Configuration
-
- Description
- The Config controller manages the configuration of system resources such as dataset, collection or crawlers. To scale up, increase the number of pods for the Config controller. The number of pods should be 3 or 5.
- Kubernetes controller name
- Config
- Minimum number of pods
- 3
- Persistence volume name
wex-config
- Persistence volume content
- Collection and file resource information.
IBM Cloud Private management console provides a graphical user interface to manage and monitor your Kubernetes objects.