Jaql scripts for custom global analysis

The custom global analysis logic is implemented by creating a Jaql (Query Language for JSON) script.

The inputs for the script are the fields, facets, and text that are extracted from the content during the document processing stage. Use the readGAInput(GAOptions) function to get document fields, facets, and text content in JSON format. The output from the script can be stored as document fields or facets in the Watson Explorer Content Analytics index by using the writeGAOutput(GAOptions) function. GAOptions is a JSON record that contains the necessary parameters. GAOptions can be obtained by using the getGAOptions($MetaTrackerJaqlVars) function. $MetaTrackerJaqlVars is always needed as an argument. To call these functions, modules with the namespace ica::ga must be imported. The following example shows a sample custom global analysis Jaql script:

import ica::ga(*); 

options:=getGAOptions($MetaTrackerJaqlVars);

readGAInput(options)
  -> someOperation()
  -> anotherOperation() 
  -> writeGAOutput(options);

Input JSON format

The function readGAInput() returns an array of JSON records. Each record represents a separate document that was processed by Watson Explorer Content Analytics. Each record can contain field values, facet values, and textual content, as configured in the administration console. The following table lists the fields that can be included in the JSON records.

Field name Required or Optional Remarks
uri Required The document ID, such as the URL.
content Required Contains the text that the parser extracted from the document content
metadata Required Contains information about the document metadata.
fields Optional An array of records that contain information about the metadata fields. Each record contains the name and value fields
name Optional The name of a document field.
value Optional The value of a document field.
docfacets Optional An array of records that contain information about the metadata facets. Each record contains the path and keyword fields.
path Optional The facet path.
keyword Optional A value associated with this facet.
textfacets Optional An array of records that contain information about the facets that comes from an annotation. Each record contains the begin, end, path, and keyword fields.
begin Optional For facets that come from an annotation, the character position that marks the beginning of the annotation.
end Optional For facets that come from an annotation, the character position that marks the end of the annotation.
The following code is an example of input data for two documents.
[
{
  "uri" : "jdbc://ICA/APP.CLAIM/ID/0",
  "content" : "[Pack] The straw was peeled off from the juice pack.",
  "metadata" : {
    "fields" : [ {
      "name" : "date",
      "value" : "1199113200000"
    }, {
      "name" : "title",
      "value" : "lemon tea - Package / container"
    } ],
    "docfacets" : [ {
      "path" : [ "date", "2008", "1", "1", "0" ],
      "keyword" : ""
    }, {
      "path" : [ "product" ],
      "keyword" : "lemon tea"
    } ],
    "textfacets" : [ {
      "begin" : 1,
      "end" : 5,
      "path" : [ "_word", "noun", "general" ],
      "keyword" : "pack"
    }, {
      "begin" : 11,
      "end" : 16,
      "path" : [ "_word", "adj" ],
      "keyword" : "straw"
    }]
  }
}, {
  "uri" : "jdbc://ICA/APP.CLAIM/ID/1",
  "content" : "I got some ice cream for my children, but there was something like a piece of thread inside the cup.",
  "metadata" : {
    "fields" : [ {
      "name" : "date",
      "value" : "1199199600000"
    },{
      "name" : "title",
      "value" : "vanilla ice cream - Contamination / tampering"
    } ],
    "docfacets" : [ {
      "path" : [ "date", "2008", "1", "2", "0" ],
      "keyword" : ""
    }, {
      "path" : [ "product" ],
      "keyword" : "vanilla ice cream"
    } ],
    "textfacets" : [ {
      "begin" : 2,
      "end" : 5,
      "path" : [ "_word", "verb" ],
      "keyword" : "get"
    }, {
      "begin" : 11,
      "end" : 14,
      "path" : [ "_word", "noun", "general" ],
      "keyword" : "ice"
    }]
  }
}
]

Output JSON format

To store the output into the Watson Explorer Content Analytics index, pass an array of JSON records to the first argument of the writeGAOutput() function. Each record must include a field with the name uri. The specified values of the record are stored in the index for the document whose URI matches the value of the uri field. Any other field in the record besides the uri field is stored as an index field or document-level facet for the document. In which index field or facet to store the data is determined by the field name in the JSON record. For example, the following array of JSON records adds values for the rank field and ranking facet in the index for the documents with the URIs jdbc://ICA/APP.CLAIM/ID/0 and jdbc://ICA/APP.CLAIM/ID/1.
[{"uri":"jdbc://ICA/APP.CLAIM/ID/0","rank":"1","$.ranking":"1"},
{"uri":"jdbc://ICA/APP.CLAIM/ID/1","rank":"2","$.ranking":"2"}
]
Requirement: To store data in fields and facets in the index, you must first create the fields and facets in the administration console. If a field or facet does not exist, the value is not added to the index.

For index fields, the value of the JSON record field is stored in a new index field. For the name of the new index field, the prefix custom_ is added to the name of the index field. In the previous example, if an index field with the name rank is configured for the collection, a new index field with the name custom_rank and the value 1 is added to the document in the index. Some attributes of the custom_ index fields are inherited from the original index field, as described in the following table. For example, if the parametric searchable attribute is selected for the rank index field, the custom_rank index field is also parametric searchable.

Attributes of custom_ search fields How value is determined

Returnable

Inherited from the original index field.

Faceted search

To create a facet, directly assign a value to the facet by using notation that starts with $. to indicate the facet path.

Free text search

Not free text searchable

In summary

Not in summary

Fielded search

Inherited from the original index field.

Exact match

Inherit from the original index field.

Case sensitive

Inherit from the original index field.

Parametric search

Inherit from the original index field.

Text sortable

Inherit from the original index field.

Analyzable

Not analyzable

For facets, if the collection includes a facet with the same facet path as the JSON record field name, the value of the JSON record field is stored to that facet. In the previous example, if there is an existing facet with the facet path $.ranking, the value 1 is stored in that facet. When you specify the facet path, ensure that the facet path starts with $ and that each facet path component is concatenated by a period. For example, the facet path $.ranking corresponds to the root facet with the name ranking.

You can also specify in the Jaql script to save the output in a file or some other format so that another application can use the data. For example, you can output the data to a JSON file on the local computer:

readGAInput(options)
-> someOperation()
-> write(file('/home/biadmin/ica_out.json'));