Contents


Analyzing the contextual content of hierarchical data by using IBM Watson Explorer

Comments

Dimensional models are conventionally based on parameters such as data reality, grain, dimensions, and facts or measurements. But there are business scenarios where valuable unstructured information is stored around the grains or at different levels of hierarchy within the data model.

Using the linguistic algorithms of natural language processing (NLP), IBM Watson Explorer can understand all of the words in a question, not just the keywords. In fact, it can analyze grammar, context, jargon, idioms, and even mode-of-thought. It can then draw statistical inferences and make recommendations or provide advice within a given confidence level.

This tutorial describes how to analyze a hierarchical data structure by using IBM Watson Explorer within the context of content analytics while retaining the hierarchical structure or grains of the underlying data model. This strategy gives you the flexibility and capability of analyzing aspects of unstructured content and structured dimensions together. I demonstrate step-by-step how to overcome the issue of multiplicity while performing content analysis with IBM Watson Explorer and IBM Watson Content Analytics Miner.

In preparing this tutorial, I used the following software components:

  • IBM Watson Explorer Content Analytics Version 11.0.2
  • IBM Content Analytics Studio Version 11
  • Eclipse Java EE IDE for Web Developers
  • IBM dashDB
  • JDBC Driver for DB2
  • Windows Server 2012 R2

Problem description

I have constructed this tutorial using a simple example of an incident management system. Although this is a very simple example, the concepts it illustrates can be easily extended to more complex use case scenarios, from either incident management or from other systems such as quality management, case management, workflow management, or content management.

Figure 1 through Figure 3 show the entity relation diagram, the fact table, and the hierarchical representation of the example. Consider the data structure used for incident reporting shown in Figure 1.

Figure 1. Entity relation diagram: Incident management system
entity relation diagram
entity relation diagram

The system illustrated here is an example of an Operational Source System. (For more information about Operational Source Systems, see The Data Warehouse Toolkit.) This system consists of a transaction header row that is associated with multiple transaction lines. With header/line schemas (also known as parent-child schemas), all of the header-level dimension foreign keys and "degenerate dimensions" should be included on the line-level fact table with differing granularity at each level. For this tutorial, I used INCIDENT_CODE, INCIDENT_LINE_ITEM, INVESTIGATION_CODE, RESOLUTION_CODE, and CLOSURE_CODE to create the fact table in Figure 2, which was generated from the data model shown in Figure 1. I added a few rows of example data to the fact table to build and demonstrate the solution.

Figure 2. Fact table structure & example data
fact table structure
fact table structure

By virtue of the denormalized structure of the derived fact table, it is obvious that only the leaf level is unique.

As shown in Figure 3, a symmetrical hierarchical data structure with 5 levels and 3 children at each level produces 81 rows for the root level. Repetition of each unstructured/structured content of field will increase from 3 to 81 from level 1 to 5. (Note: Figure 3 represents a fragment of the hierarchical structure representation. For more details, see the sample data file in GitHub.)

Figure 3. Hierarchical view of example data
Hierarchical view of example data
Hierarchical view of example data

Everything that is crawled by IBM Watson Explorer is considered a document or a single entity (that is, each row of the database table is considered a separate document). That document is the only grain available within it. Unfortunately, these multiple occurrences of unstructured content, when analyzed using IBM Watson, produce misleading statistics. For example, a given text in the INCIDENT_DESCRIPTION field in level 2 as shown in Figure 1 through Figure 3 is counted nine times (multiples of the child level) for one INVESTIGATION_CODE field.

This tutorial gives you a strategy to overcome this problem of multiplicity. I demonstrate how to create a custom database crawler plug-in to crawl the derived fact table in a specific way so that the dimensions of all the levels of hierarchy are contained, but multiplicity is eliminated.

What you'll need to build your application

The build your own application, you'll need the following skills and tools:

  • Working knowledge of IBM Watson Explorer (Advanced Edition)
  • Working knowledge of Content Analytics Studio and an understanding of Apache UMIA content analytics concepts
  • IBM Watson Explorer Content Analytics V.11.0.2
  • IBM Watson Explorer Content Analytics Studio V.11
  • IBM Cloud Account (optional)
  • IBM dashDB (or any standard DBMS)
  • Understanding of DBMS concepts
  • Java development skills
1

Set up the content analytics collection

To create the content analytics collection:

  1. Note the following settings and options for use by the collection.
    Option name Option value
    General options
    Collection name ATutorialCollection
    Collection type Content analytics collection
    Solution package Do not apply a solution package
    Document cache Enable the document cache
    Thumbnail generation (required for showing thumbnail images in the search results) Do not enable thumbnail generation
    Advanced options
    Description Collection for demonstrating the concept of Contextual Content Analytics of Hierarchical Data - 02/04/2018
    Collection security
    Do not enable security for the collection
    Overlay index
    Do not enable overlay index
    N-gram segmentation
    Do not enable N-gram segmentation

    Note: For more details, see "Administering Watson Explorer Content Analytics" in the IBM Knowledge Center.

  2. Create the fact table.

    A sample fact table, shown in Figure 2, is created in the IBM dashDB, which contains the sample data. Figure 4 shows the sample data in the dashDB console. Detailed steps for creating the table are out of the scope of this tutorial. However, you can use any standard relational database.

    Figure 4. Sample data in dashDB
    sample data in dashDB
    sample data in dashDB
  3. Create the JDBC crawler.

    Follow the standard process for creating a JDBC crawler, as described under "Crawler administration" in the IBM Knowledge Center.

    The following table lists the JDBC connection details for reference.

    Parameter Value
    JDBC driver name com.ibm.db2.jcc.DB2Driver
    JDBC driver class path F:\DevWTutorial\jars
    Database URL jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50000/BLUDB

    The JDBC driver used to connect dashDB is the same as DB2 and can be downloaded from the IBM Support site.

  4. Populate the index field and facet mapping.

    While creating the crawler, the index field mapping and the facet tree are populated according to the following table. Index field names are important, as these fields will be used within the custom crawler plug-in for the custom crawling process of de-duplication.

    Field Name Return-able Faceted search Free text search In summary Fielded search Parametric search Analyzable Unique identifier Facet mapping Group
    ID Yes
    INCIDENT_CODE Yes Yes Yes Yes Yes No No Incident Code Incident Header
    INCIDENT_TYPE_CODE Yes Yes Yes Yes Yes No No Incident Type Code Incident Header
    REQUESTED_BY Yes Yes Yes Yes Yes No No Raised By Incident Header
    AFFECTED_ITEM Yes Yes Yes Yes Yes No No Affected Product Incident Header
    HELP_DESK_ID Yes Yes Yes Yes Yes No No Agent Code Incident Header
    INCIDENT_LINE_ITEM Yes Yes Yes Yes Yes No No Incident Line Item Incident Details
    ASSIGNED_DEPT Yes Yes Yes Yes Yes No No Assigned to Dept. Incident Details
    INCIDENT_PART_NO Yes Yes Yes Yes Yes No No Affected Parts(s) Incident Details
    INCIDENT_DESCRIPTION Yes No Yes Yes No No Yes
    INVESTIGATION_CODE Yes Yes Yes Yes Yes No No Investigation Code Investigation Details
    DIAGNOSIS_CODE Yes Yes Yes Yes Yes No No Diagnosis Code Investigation Details
    DEFECTIVE_PARTS Yes Yes Yes Yes Yes No No Defective Part(s) Investigation Details
    DIAGNOSIS_DESCRIPTION Yes No Yes Yes No No Yes
    RESOLUTION_CODE Yes Yes Yes Yes Yes No No Resolution Code Resolution Details
    R_DEPT_CODE Yes Yes Yes Yes Yes No No Resolution Dept. Code Resolution Details
    R_WORK_CENTER Yes Yes Yes Yes Yes No No Resolution Work Center Resolution Details
    RESOLUTION_DESC Yes No Yes Yes No No Yes
    CLOSURE_CODE Yes Yes Yes Yes Yes No No Closure Code Final Disposition Logs
    CLOSING_COMMENT Yes No Yes Yes No No Yes

    Figure 5 shows the facet tree hierarchy that is displayed in IBM Watson Explorer Content Analytics Miner. After the facet tree is created, the table is crawled, initially with the above configuration.

    Figure 5. Facet tree structure
    Facet tree structure
    Facet tree structure
  5. Create a defect dictionary and deploy it in the content analytics server.

    The following table lists the keywords of the defect dictionary along with their parts of speech and inflections.

    Defects
    (Keyword)
    Part of speech Inflection(s)
    damage Noun damages
    damage Verb damaged damages damaging
    foul Verb fouled fouling fouls
    foul Noun fouls
    crack Noun cracks
    crack Verb cracked cracking cracks
    clip Verb clipped clipping clips
    clip Noun clips
    break Noun breaks
    break Verb breaking breaks broke broken
    rip Verb ripped ripping rips
    rip Noun rips
    abrasion Noun abrasions
    mark Noun marks
    mark Verb marked marking marks
    seal Verb sealed sealing seals
    seal Noun seals
    scrap Noun scraps
    scrap Verb scrapped scrapping scraps

    The defect dictionary is generated using the keywords and is used to analyze the instructed text content in the fact table. For a more detailed understanding, see the sample data file in GitHub.

    Figure 6 shows how the dictionary is organized within the IBM Watson Content Analytics Studio. For more information, see "Content Analytics Studio for advance text analytics" in the IBM Knowledge Center.

    Figure 6. Defect dictionary in Content Analytics Studio
    Defect dictionary
    Defect dictionary

    After the defect dictionary is created, the configured UIMA pipeline is exported to the Content Analytics Server.

    (Detailed steps for creating the dictionary and exporting the UIMA pipeline are beyond the scope of this tutorial. This section is for reference only to maintain the continuity of the tutorial.)

2

Generate initial output without the custom crawler plug-in

  1. Get the initial statistics in IBM Watson Explorer Content Analytics Miner.

    After the collection has been crawled, indexed, and made ready for analysis, use the IBM Watson Explorer Content Analytics to verify the initial statistics. Note that in Figure 7, 57 occurrences of "fouling" and 50 occurrences of "scrap" keywords are found from the table, followed by other keywords.

    Figure 7. Initial statistics in IBM Watson Explorer Content Analytics Miner
    Initial statistics in Watson Explorer
    Initial statistics in Watson Explorer
  2. Now query the documents (a single row of the table) in the IBM Watson Explorer Content Miner by selecting the keyword fouling in the following query:
    [Query Text : (*:*) AND (keyword::/"Explore"/"Defect Type(s)"/"fouling")]

    If you look at the sample fact table, you'll see that the INCIDENT_DESCRIPTION field contains the same value repeated 20 times as it is joined with the INVESTIGATION_DETAILS table. This is misleading and is an example of the multiplicity problem that is inherent in the nature of relational joins. In the next section , I show you how to create a custom plug-in to overcome this issue.

    INCIDENT_HDR INCIDENT_DETAILS
    ID INCIDENT_CODE INCIDENT_LINE_ITEM INCIDENT_DESCRIPTION
    1 D1 D1.1 PART-01 is fouling and cracked or clipped
    2 D1 D1.1 PART-01 is fouling and cracked or clipped
    3 D1 D1.1 PART-01 is fouling and cracked or clipped
    4 D1 D1.1 PART-01 is fouling and cracked or clipped
    5 D1 D1.1 PART-01 is fouling and cracked or clipped
    6 D1 D1.1 PART-01 is fouling and cracked or clipped
    7 D1 D1.1 PART-01 is fouling and cracked or clipped
    8 D1 D1.1 PART-01 is fouling and cracked or clipped
    9 D1 D1.1 PART-01 is fouling and cracked or clipped
    10 D1 D1.1 PART-01 is fouling and cracked or clipped
    11 D1 D1.1 PART-01 is fouling and cracked or clipped
    12 D1 D1.1 PART-01 is fouling and cracked or clipped
    13 D1 D1.1 PART-01 is fouling and cracked or clipped
    14 D1 D1.1 PART-01 is fouling and cracked or clipped
    15 D1 D1.1 PART-01 is fouling and cracked or clipped
    16 D1 D1.1 PART-01 is fouling and cracked or clipped
    17 D1 D1.1 PART-01 is fouling and cracked or clipped
    18 D1 D1.1 PART-01 is fouling and cracked or clipped
    19 D1 D1.1 PART-01 is fouling and cracked or clipped
    20 D1 D1.1 PART-01 is fouling and cracked or clipped
    21 D1 D1.2 PART-02 is broken or damaged
3

Create a custom Java crawler plug-in

This section describes the custom crawler plug-in and using the conceptual view of IBM Watson Explorer Content Analytics Miner to obtain statistically correct insights.

Crawler plug-ins for IBM Watson Explorer Content Analytics are divided into two categories based on data sources:

Type A data sources Type B data sources
Content Integrator
  • Agent for Windows file systems crawler
Content Manager
  • BoardReader crawler
DB2
  • Case Manager crawler
JDBC database
  • Exchange Server crawler
Notes
  • FileNet P8 crawler
Quickr for Domino SharePoint crawler
Seed list
File system

In this procedure, I use a JDBC database, which is a Type A data source. I can create a Java class to programmatically update the value of security tokens, metadata, and the document content of Type A data sources.

The following code is for the Java plug-in. It is followed by a description of the deployment process and an analysis of the code's logic. PluginLogger logs the issues related to the crawler.

Custom Java crawler plug-in

         package com.ibm.es.crawler.dbplugins; 
 import java.util.List; 
 import java.util.Map; 
 import java.util.stream.Collectors; 
 import com.ibm.es.crawler.plugin.AbstractCrawlerPlugin; 
 import com.ibm.es.crawler.plugin.CrawledData; 
 import com.ibm.es.crawler.plugin.CrawlerPluginException; 
 import com.ibm.es.crawler.plugin.FieldMetadata; 
 import com.ibm.es.crawler.plugin.logging.PluginLogger; 
 public class TutorialDBCrawlerPlugin extends AbstractCrawlerPlugin { 
    /** Logger */ 
    private static final PluginLogger logger; 
    static { 
    PluginLogger.init(PluginLogger.LOGTYPE_OSS,PluginLogger.LOGLEVEL_INFO); 
    logger = PluginLogger.getInstance(); 
    } 
    /** End Logger **/ 
 	private static final String INCIDENT_DESCRIPTION = "INCIDENT_DESCRIPTION"; 
 	private static final String DIAGNOSIS_DESCRIPTION = "DIAGNOSIS_DESCRIPTION"; 
 	private static final String RESOLUTION_DESC = "RESOLUTION_DESC"; 
 	private static final String INVESTIGATION_CODE = "INVESTIGATION_CODE"; 
 	private static final String RESOLUTION_CODE = "RESOLUTION_CODE"; 
 	private static final String CLOSURE_CODE = "CLOSURE_CODE"; 
    public TutorialDBCrawlerPlugin() { 
       super(); 
    } 
    public void init() throws CrawlerPluginException { 
    } 
    public boolean isMetadataUsed() { 
       return true; 
    } 
    public void term() throws CrawlerPluginException { 
       return; 
    } 
    public CrawledData updateDocument(CrawledData crawledData)  
          throws CrawlerPluginException { 
 	// print metadata 
 	   List<FieldMetadata> metadataList = crawledData.getMetadataList(); 
 	   if (!(metadataList == null)) { 
 		   /*for (int i = 0; i < metadataList.size(); i++) {			    
 			   //logger.warn(metadataList.get(i).getFieldName() +  
 			//		   ": " + metadataList.get(i).getValue()); 
 		   }*/ 
 	   } else 
 		   logger.error("metadatalist is empty"); 
 	   crawledData.setMetadataList(setIndexContext(metadata List)); 
       return crawledData; 
    } 
    public boolean isContentUsed() { 
       return true; >
    } 
    public  List<FieldMetadata> setIndexContext(List<FieldMetadata> metaList){ 
 	   Map<String, String> OneRow = metaList.stream().collect( 
                Collectors.toMap(FieldMetadata::getFieldName,  
             		   FieldMetadata::getValue));	    
 	   String closureCode =  
 			   getLeafValue(OneRow.get(CLOSURE_CODE)); 
 	   String resolutionCode =  
 			   getLeafValue(OneRow.get(RESOLUTION_CODE)); 
 	   String investigationCode =  
 			   getLeafValue(OneRow.get(INVESTIGATION_CODE)); 
 	   logger.warn("Modified Map Details :"+ OneRow.toString()); 
 	   List<FieldMetadata> returnMetaList = metaList;  
 	   for (int i = 0; i < returnMetaList.size(); i++) { 
 		   if (returnMetaList.get(i).getFieldName(). 
 				   contentEquals(INCIDENT_DESCRIPTION)) { 
 			   if(closureCode.equals("1") &&  
 					   resolutionCode.equals("1") &&  
 					   investigationCode.equals("1")){ 
 			   } 
 			   else 
 				   returnMetaList.get(i).setValue(""); 
 		   } 
 		   if (returnMetaList.get(i).getFieldName(). 
 				   contentEquals(DIAGNOSIS_DESCRIPTION)) { 
 			   if(closureCode.equals("1") &&  
 					   resolutionCode.equals("1")){ 
 			   } 
 			   else 
 				   returnMetaList.get(i).setValue("");    
 		   } 
 		   if (returnMetaList.get(i).getFieldName(). 
 				   contentEquals(RESOLUTION_DESC)) { 
 			   if(closureCode.equals("1")){ 
 			   } 
 			   else 
 				   returnMetaList.get(i).setValue("");   
 		   }		   
 	   }    
 	   return returnMetaList; 
    } 
    public String getLeafValue(String keyField){ 
    	String hierarchy[] =  keyField.split("\\."); 
 	return hierarchy[hierarchy.length-1]; 
    } 
 }

Custom crawler plug-in methods description

To create a Java class for use as a crawler plug-in with content-related functions for Type A data sources, extend com.ibm.es.crawler.plugin.AbstractCrawlerPlugin and implement the following methods:

  • init()
  • isMetadataUsed()
  • isContentUsed()
  • activate()
  • deactivate()
  • term()
  • updateDocument()

The AbstractCrawlerPlugin class is an abstract class. The init, activate, deactivate, and term methods are implemented to do nothing. The isMetadataUsed method and isContentUsed method are implemented to return false by default. The updateDocument method is an abstract method, so you must implement it.

For name resolution, use the ES_INSTALL_ROOT/lib/dscrawler.jar file.

When the crawler session starts, the plug-in process is forked. An AbstractCrawlerPlugin object is instantiated with the default constructor and the init, isMetadataUsed, and isContentUsed methods are called one time. During the crawler session, the activate method is called when the crawler starts its crawling and the deactivate method is called when the crawler finishes its crawling. When the crawler session ends, the term method is called and the object is destroyed. If the crawler scheduler is enabled, the activate method is called when the crawling is scheduled to start and the deactivate method is called when the crawling is scheduled to end. Because a single crawler session runs continuously when the crawler scheduler is enabled, the term method is not called to destroy the object.

Custom crawler plug-in compile and deployment

  1. Compile the implemented code and make a JAR file for it. Add the ES_INSTALL_ROOT/lib/dscrawler.jar file to the class path when you compile.
  2. In the administration console, follow these steps:
    1. Edit the appropriate collection.
    2. Select the Crawl page and edit the crawler properties for the crawler that will use the custom Java class.
    3. Specify the following items:
      • The fully qualified class name of the implemented Java class: for example, com.ibm.es.crawler.dbplugins.TutorialDBCrawlerPlugin.
      • Include the name of the JAR file in your path declaration: for example, F:\DevWTutorial\Plugins\DBCrawlerPlugin\TutorialDBCrawlerPlugin.jar.
    4. On the Crawl page, click Monitor. Then click Stop and Start to restart the session for the crawler that you edited.
    5. Click Details and start a full crawl.

Other than the methods extended from the AbstractCrawlerPlugin abstract class, two key methods have been implemented: getLeafValue() and setIndexContext(). These methods eliminate the duplicates from the collection index.

  • The getLeafValue () method is called from setIndexContext with a key field value and returns the value of the leaf assuming "." is the separator. For example, if D1.1.1.1.1 is passed as an argument, 1 is returned.
  • The setIndexContext() method is called from the updateDocument() method with an argument of a list of FieldMetadata (com.ibm.es.crawler.plugin.FieldMetadata). This method works on the FieldMetadata list based on the given logic and returns the de-duplicated FieldMetadata, which is eventually set into the crawled data and used by the crawler process.

After the code is ready, it can be deployed as explained in the procedure of this section (substeps 1 through 3). For more information, see "Creating a crawler plug-in for Type A data sources" in the IBM Knowledge Center.

Contextual views

A unique feature of IBM Watson Explorer is the contextual view. You can explore the data's contents in a specific context by grouping analyzable fields. For example, if the data is configured to analyze "Incident Description and Resolution Description" grouped together or with individual contextual views, users can limit the results to the parts of documents that match those contexts.

Contextual view creation

  1. You create a contextual view in the administrative console by first selecting Configure parse and index setting.
    Figure 8. Configure parse and index setting, Contextual view option
    Configure parse and index setting
    Configure parse and index setting
  2. Select Contextual views and then create the contextual views for textual content based on the business requirements. The contextual views created for this example are listed in the following table.
    Context fields Contextual view name
    INCIDENT_DESCRIPTION Incident Details
    DIAGNOSIS_DESCRIPTION
    Investigation
    INCIDENT_DESCRIPTION & DIAGNOSIS_DESCRIPTION Incident & Investigation
    RESOLUTION_DESC
    Resolution
    CLOSING_COMMENT
    Final Disposition
    RESOLUTION_DESC & CLOSING_COMMENT Resolution & Final Disposition

    Figure 9 shows the list of contextual views created for this tutorial as given in the previous table.

    Figure 9. Contextual view details
    Contextual view details
    Contextual view details

Now business users are enabled to derive insights based on the context of the hierarchical data and the facet hierarchy, which is described in the following section.

Contextual analysis using IBM Watson Explorer Content Analytics Miner

The following outputs in IBM Watson Explorer Content Miner are indicative samples of occurrences of defect types based on the context selected by the user. They show the exact frequency of the occurrences.

Incident Details

Figure 10 shows the IBM Content Analytics Miner output when the Incident Details contextual view is selected. The figure shows two occurrences of the keyword fouling from the unstructured field INCIDENT_DESCRIPTION and so on with the other keywords. The output can be also viewed for other configured contextual views.

Figure 10. Incident details contextual view in IBM Content Analytics Miner
Incident details
Incident details

Resolution & Final Disposition

In Figure 11, you see the fouling keyword in 24 fields, which include the fields RESOLUTION_DESC and CLOSING_COMMENT. As the CLOSING_COMMENT is the leaf level in the hierarchy, it is unique and the occurrences are high.

Figure 11. Resolution & Final Disposition contextual view in IBM Content Analytics Miner
Final disposition
Final disposition

Use case analysis

Now let's consider a use case scenario where business users want to analyze the occurrence of a keywords and want to find the relationship with the facet derived from the dimensions at different levels of hierarchy.

Assume that you see that the fouling keyword has unusual occurrences compared with other defect types in the context of the Resolution & Final Disposition contextual view. You want to investigate the issue in more detail and to analyze the relationship of the "reworking" and "fouling" components referred to in those incidents. Here are the steps you would follow.

  1. Add the fouling keyword to the query, which refines the analysis to 24 results out of 150 within the context of Resolution & Final Disposition"
    ((*:*) AND (keyword::/"Explore"/"Defect Type(s)"/"fouling")) IN resolution_final_disposition
  2. Out of all the incidents, incidents D2 and D1 have the most occurrences of the fouling keyword in Resolution and Closing Comment. The user adds these two incident codes to the query, which narrows the results further to 16.
    Figure 12. Frequency of incident based on the criteria
    Frequency of Incident based on the criteria
    Frequency of Incident based on the criteria
    (((*:*) AND (keyword::/"Explore"/"Defect Type(s)"/"fouling")) IN resolution_final_disposition) AND (keyword::/"Facet Hierarchy"/"Incident Header"/"Incident Code"/"D2" OR keyword::/"Facet Hierarchy"/"Incident Header"/"Incident Code"/"D1")
  3. Now a free form of text "rework and use" is added to the query, which further refines the scope and retrieves five specific records. (Instead of free form text, it would be possible to use the phrase annotation to retrieve the different writing patterns: for example, "rework and reinspect", "rwrk & use.", and "rework & re-use".) Here is the final query syntax:
    ((("rework and use") AND (keyword::/"Explore"/"Defect Type(s)"/"fouling")) IN resolution_final_disposition) AND (keyword::/"Facet Hierarchy"/"Incident Header"/"Incident Code"/"D2" OR keyword::/"Facet Hierarchy"/"Incident Header"/"Incident Code"/"D1")

Now you can find the trend choosing different facets from the facet tree. Figure 13 shows the trend charts in IBM Watson Explorer Content Analytics Miner's dashboard for Affected Product vs. Affected Part(s) based on the text analytics query derived above.

Figure 13. Final output in IBM Watson Content Analytics Miner dashboard

You can see that out of six different parts affected in incidents D1 and D2, part-04, part-13, and part-01 are affected in this use case scenario.

Conclusion

This tutorial has described a strategy for analyzing hierarchical data structure in IBM Watson Explorer within the context of content analytics while retaining the hierarchical structure or grains of the dimensional model. The dataset used in this tutorial is representative of real-world scenarios, but it is not very large or complex. In an actual business scenario, this kind of analysis would give you the combined power of both content analytics and dimensional analysis, and would provide you with machine-assisted decision making, automation, and business optimization.

Such scenarios are common in the following domains:

  • Incident management
  • Case management
  • Content management
  • Quality management system
  • Enterprise information management

The concepts described here can be implemented for analyzing data inconsistency in a source system, automated data profiling, big data analytics, data governance, and data ontology creation.

Using these concepts, you can overcome the problem of data duplicity and analyze the unstructured text contents at each hierarchical level (separately or together). You can thereby derive insights that can be made accessible to business users.


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cognitive computing
ArticleID=1059150
ArticleTitle=Analyzing the contextual content of hierarchical data by using IBM Watson Explorer
publish-date=04082018