You can create dictionaries of terms that are extracted
from Resource Description Framework (RDF) files. For
example, you might create a dictionary of chemical
substances by extracting the terms from RDF files of DBPedia. When
you deploy the analytics facet dictionary to a collection
on your Watson Explorer Content Analytics server,
the Watson Explorer Content Analytics pipeline
annotates instances of these terms that are found
in your documents and creates facets for these terms.
Before you begin
In the Watson Explorer Content Analytics administration
console, enable the RDF store configuration
for a collection. Use the embedded
triplestore database.
Restriction: You cannot upload the contents of an RDF file from
Content Analytics Studio if you use DB2® as the RDF
store.
You must also create the
root facet under which to display the facets
that are created based on the analytics facet dictionary.
About this task
If you extract dictionary entries
from an RDF store that is configured
for a content analytics collection, you must import the dictionary
entries into an analytics facet dictionary
in Content Analytics Studio, build the dictionary,
and then export the dictionary to Watson Explorer Content Analytics. If you use an RDF store
that is configured for an enterprise
search collection, the dictionary
and facet tree are automatically created in Watson Explorer Content Analytics after you run the
ontology queries.
Procedure
To create and deploy an analytics facet dictionary:
- Upload the contents of an RDF file to the RDF store that
you configured on the Watson Explorer Content Analytics server. In
your Watson Explorer Content Analytics server
connection file, select a
content analytics collection and
click Upload RDF file. Watson Explorer Content Analytics supports
standard formats of RDF data sets (Turtle,
N-Triples, TriG, N-Quads,
N3, and RDF/XML) and standard data model
definitions (OWL, SKOS, and RDF schema).
- Optional: If you want to view and tune the
contents of the RDF store
before you extract dictionary entries, you can run a SPARQL
query to view, add, modify, or remove
data from the RDF store. Watson Explorer Content Analytics supports
only SPARQL queries that match
against the default graph of RDF
data sets.
Tip: If you want to use the RDF data
in another application
or add the data to the RDF
store of another collection, you can download the
RDF content.
- Extract dictionary entries from the RDF store by specifying
SPARQL queries. You
can build the queries by selecting predicates from an RDF
graph or by manually specifying the
queries.
- In your Watson Explorer Content Analytics server
connection file,
click Execute Ontology
Query.
- Search for a literal value to preview the
content of the RDF store. For example, to build a dictionary of
animals, you might search for animal
or cat. Watson Explorer Content Analytics
searches for exact matches
of the keyword and then
displays an RDF graph of the resources and
predicates that are related to
the keyword.
- Right-click predicates to select which
predicates to use for generating
the ontology query
that extracts the dictionary entries from
the RDF store. Predicates are
represented as lines between
resource nodes in the RDF graph. You can select
the following types of
predicates to generate the
ontology query:
- Name predicate
- Used for generating the resource query. The
object of a name predicate is used
as the surface form
of entries in the analytics facet
dictionary.
- Type predicate
- Used for generating the resource query. The
type is used to create the facet paths
in the analytics
facet dictionary.
- Super class predicate
- Used for generating the ontology query. It
indicates the hierarchy relationship
of the class and
is used to create the facet tree.
- Class label predicate
- Used for generating the ontology query. This
predicate provides the facet name.
- Synonym predicate
- Used for generating the synonym query.
- After you selected the predicates, click
Generate Query to
populate the query
fields. You can edit the values in the fields or
provide values for fields
that were not automatically
populated. For example, you can add a FILTER condition
to narrow down the
set of resources to
extract.
- Specify the facet root path that you created in
the Watson Explorer Content Analytics
administration console. For
example, if you set the root path to
$.myRDF, the ontology
query creates facet
paths such as $.myRoot.animal.
- Click Finish to run the
ontology query.
- Create an empty analytics facet dictionary. In
the Studio Explorer view, right-click
the Resources/Dictionaries directory
in your project and click . For the Dictionary
type setting, select
Analytics facet
dictionary.
- Import the entries into the analytics facet dictionary. In your Watson Explorer Content Analytics server
connection file, click Manage
RDF Tasks,
select the completed ontology
query task, and click Import ontology into
dictionary.
- Build the analytics facet dictionary. In the Studio
Explorer view, right-click
the database file for your analytics facet dictionary
in the Resources/Dictionaries
directory and click Build
Studio Resource.
- Optional: If you want to test the dictionary
in Content Analytics Studio, include the
dictionary file in the lexical
analysis stage of a UIMA pipeline
configuration file. You can analyze sample text and edit
the dictionary as needed.
- Export the dictionary to a Watson Explorer Content Analytics
collection. In the Manage
RDF Tasks window, select
the completed ontology query task and click
Export RDF dictionary
into Watson Explorer Content Analytics
collection.
- In the Watson Explorer Content Analytics
administration console, redeploy
the analytic resources and
rebuild the index for the collection. If a document cache
is not enabled for the collection, you
must recrawl or reimport documents after you deploy the
analytic resources.
What to do next
You can view the annotation results
in the text miner application. In the
Facets view, select the facets that are generated by the analytics
facet dictionary.
Tip: If you want to delete the facets dictionary
from Watson Explorer Content Analytics, delete
the associated ontology query task
in the Manage RDF Tasks
window. To remove the generated facets, manually delete
the corresponding facets from the
facet tree of the content analytics
collection or the corresponding facet definitions of the enterprise
search collection in the Watson Explorer Content Analytics administration console.