Semantic Networks
In this release, the semantic networks technique is only available for English language text.
This technique builds categories using a built-in network of word relationships. For this reason, this technique can produce very good results when the terms are concrete and are not too ambiguous. However, you should not expect the technique to find many links between highly technical/specialized concepts. When dealing with such concepts, you may find the concept inclusion and concept root derivation techniques to be more useful.
How Semantic Network Works
The idea behind the semantic network technique is to leverage known word
relationships to create categories of synonyms or hyponyms. A hyponym is when one concept is
a sort of second concept such that there is a hierarchical relationship, also known as an ISA
relationship. For example, if animal is a concept, then cat and
kangaroo are hyponyms of animal since they are sorts of
animals.
In addition to synonym and hyponym relationships, the semantic network
technique also examines part and whole links between any concepts from the
<Location> type. For example, the technique will group the concepts
normandy, provence, and france into one category
because Normandy and Provence are parts of France.
Semantic networks begin by identifying the possible senses of each concept in
the semantic network. When concepts are identified as synonyms or hyponyms, they are grouped into a
single category. For example, the technique would create a single category containing these three
concepts: eating apple, dessert apple, and granny
smith since the semantic network contains the information that: 1) dessert
apple is a synonym of an eating apple, and 2) granny
smith is a sort of eating apple (meaning it is a hyponym of eating
apple).
Taken individually, many concepts, especially uniterms, are ambiguous. For
example, the concept buffet can denote a sort of meal or a piece of furniture. If
the set of concepts includes meal, furniture and
buffet, then the algorithm is forced to choose between grouping
buffet with meal or with furniture. Be aware that
in some cases the choices made by the algorithm may not be appropriate in the context of a
particular set of records or documents.
The semantic network technique can outperform concept inclusion with certain
types of data. While both the semantic network and concept inclusion recognize that apple
pie is a sort of pie, only the semantic network recognizes that
tart is also a sort of pie.
Semantic networks will work in conjunction with the other techniques. For
example, suppose that you have selected both the semantic network and inclusion techniques and that
the semantic network has grouped the concept teacher with the concept
tutor (because a tutor is a kind of teacher). The inclusion algorithm can group the
concept graduate tutor with tutor and, as a result, the two
algorithms collaborate to produce an output category containing all three concepts:
tutor, graduate tutor, and teacher.
Options for Semantic Network
There are a number of additional settings that might be of interest with this technique.
- Change the Maximum search distance. Select how far you want the techniques to search before producing categories. The lower the value, the fewer results produced—however, these results will be less noisy and are more likely to be significantly linked or associated with each other. The higher the value, the more results you will get—however, these results may be less reliable or relevant.
For example, depending on the distance, the algorithm searches from
Danish pastry up to coffee roll (its parent), then
bun (grand parent) and on upwards to bread.
By reducing the search distance, this technique produces smaller categories that might be easier to work with if you feel that the categories being produced are too large or group too many things together.
Important! Additionally, we recommend that you do not apply the option Accommodate spelling errors for a minimum root character limit of (defined on the Expert tab of the node or in the Extract dialog box) for fuzzy grouping when using this technique since some false groupings can have a largely negative impact on the results.