Building dictionaries and facet trees from RDF files

You can create dictionaries of terms that are extracted from Resource Description Framework (RDF) files. For example, you might create a dictionary of chemical substances by extracting the terms from RDF files of DBPedia. When you deploy the analytics facet dictionary to a collection on your Watson Explorer Content Analytics server, the Watson Explorer Content Analytics pipeline annotates instances of these terms that are found in your documents and creates facets for these terms.

Before you begin

In the Watson Explorer Content Analytics administration console, enable the RDF store configuration for a collection. Use the embedded triplestore database.

Restriction: You cannot upload the contents of an RDF file from Content Analytics Studio if you use DB2® as the RDF store.

You must also create the root facet under which to display the facets that are created based on the analytics facet dictionary.

About this task

If you extract dictionary entries from an RDF store that is configured for a content analytics collection, you must import the dictionary entries into an analytics facet dictionary in Content Analytics Studio, build the dictionary, and then export the dictionary to Watson Explorer Content Analytics. If you use an RDF store that is configured for an enterprise search collection, the dictionary and facet tree are automatically created in Watson Explorer Content Analytics after you run the ontology queries.

Procedure

To create and deploy an analytics facet dictionary:

  1. Upload the contents of an RDF file to the RDF store that you configured on the Watson Explorer Content Analytics server. In your Watson Explorer Content Analytics server connection file, select a content analytics collection and click Upload RDF file. Watson Explorer Content Analytics supports standard formats of RDF data sets (Turtle, N-Triples, TriG, N-Quads, N3, and RDF/XML) and standard data model definitions (OWL, SKOS, and RDF schema).
  2. Optional: If you want to view and tune the contents of the RDF store before you extract dictionary entries, you can run a SPARQL query to view, add, modify, or remove data from the RDF store. Watson Explorer Content Analytics supports only SPARQL queries that match against the default graph of RDF data sets.
    Tip: If you want to use the RDF data in another application or add the data to the RDF store of another collection, you can download the RDF content.
  3. Extract dictionary entries from the RDF store by specifying SPARQL queries. You can build the queries by selecting predicates from an RDF graph or by manually specifying the queries.
    1. In your Watson Explorer Content Analytics server connection file, click Execute Ontology Query.
    2. Search for a literal value to preview the content of the RDF store. For example, to build a dictionary of animals, you might search for animal or cat. Watson Explorer Content Analytics searches for exact matches of the keyword and then displays an RDF graph of the resources and predicates that are related to the keyword.
    3. Right-click predicates to select which predicates to use for generating the ontology query that extracts the dictionary entries from the RDF store. Predicates are represented as lines between resource nodes in the RDF graph. You can select the following types of predicates to generate the ontology query:
      Name predicate
      Used for generating the resource query. The object of a name predicate is used as the surface form of entries in the analytics facet dictionary.
      Type predicate
      Used for generating the resource query. The type is used to create the facet paths in the analytics facet dictionary.
      Super class predicate
      Used for generating the ontology query. It indicates the hierarchy relationship of the class and is used to create the facet tree.
      Class label predicate
      Used for generating the ontology query. This predicate provides the facet name.
      Synonym predicate
      Used for generating the synonym query.
    4. After you selected the predicates, click Generate Query to populate the query fields. You can edit the values in the fields or provide values for fields that were not automatically populated. For example, you can add a FILTER condition to narrow down the set of resources to extract.
    5. Specify the facet root path that you created in the Watson Explorer Content Analytics administration console. For example, if you set the root path to $.myRDF, the ontology query creates facet paths such as $.myRoot.animal.
    6. Click Finish to run the ontology query.
  4. Create an empty analytics facet dictionary. In the Studio Explorer view, right-click the Resources/Dictionaries directory in your project and click New > Dictionary Database. For the Dictionary type setting, select Analytics facet dictionary.
  5. Import the entries into the analytics facet dictionary. In your Watson Explorer Content Analytics server connection file, click Manage RDF Tasks, select the completed ontology query task, and click Import ontology into dictionary.
  6. Build the analytics facet dictionary. In the Studio Explorer view, right-click the database file for your analytics facet dictionary in the Resources/Dictionaries directory and click Build Studio Resource.
  7. Optional: If you want to test the dictionary in Content Analytics Studio, include the dictionary file in the lexical analysis stage of a UIMA pipeline configuration file. You can analyze sample text and edit the dictionary as needed.
  8. Export the dictionary to a Watson Explorer Content Analytics collection. In the Manage RDF Tasks window, select the completed ontology query task and click Export RDF dictionary into Watson Explorer Content Analytics collection.
  9. In the Watson Explorer Content Analytics administration console, redeploy the analytic resources and rebuild the index for the collection. If a document cache is not enabled for the collection, you must recrawl or reimport documents after you deploy the analytic resources.

What to do next

You can view the annotation results in the text miner application. In the Facets view, select the facets that are generated by the analytics facet dictionary.

Tip: If you want to delete the facets dictionary from Watson Explorer Content Analytics, delete the associated ontology query task in the Manage RDF Tasks window. To remove the generated facets, manually delete the corresponding facets from the facet tree of the content analytics collection or the corresponding facet definitions of the enterprise search collection in the Watson Explorer Content Analytics administration console.