Co-occurrence Rules

Co-occurrence rules enable you to discover and group concepts that are strongly related within the set of documents or records. The idea is that when concepts are often found together in documents and records, that co-occurrence reflects an underlying relationship that is probably of value in your category definitions. This technique creates co-occurrence rules that can be used to create a new category, extend a category, or as input to another category technique. Two concepts strongly co-occur if they frequently appear together in a set of records and rarely separately in any of the other records. This technique can produce good results with larger datasets with at least several hundred documents or records.

For example, if many records contain the words price and availability, these concepts could be grouped into a co-occurrence rule, (price & available). In another example, if the concepts peanut butter, jelly, sandwich and appear more often together than apart, they would be grouped into a concept co-occurrence rule (peanut butter & jelly & sandwich).

Important! In earlier releases, co-occurrence and synonym rules were surrounded by square brackets. In this release, square brackets now indicate a text link analysis pattern result. Instead, co-occurrence and synonym rules will be encapsulated by parentheses such as (speaker systems|speakers).

How Co-occurrence Rules Works

This technique scans the documents or records looking for two or more concepts that tend to appear together. Two or more concepts strongly co-occur if they frequently appear together in a set of documents or records and if they seldom appear separately in any of the other documents or records.

When co-occurring concepts are found, a category rule is formed. These rules consist of two or more concepts connected using the & Boolean operator. These rules are logical statements that will automatically classify a document or record into a category if the set of concepts in the rule all co-occur in that document or record.

Options for Co-occurrence Rules

If you are using the co-occurrence rule technique, you can fine-tune several settings that influence the resulting rules:

  • Change the Maximum search distance. Select how far you want the technique to search for co-occurrences. As you increase the search distance, the minimum similarity value required for each co-occurrence is lowered; as a result, many co-occurrence rules may be produced, but those which have a low similarity value will often be of little significance. As you reduce the search distance, the minimum required similarity value increases; as a result, fewer co-occurrence rules are produced, but they will tend to be more significant (stronger).
  • Minimum number of documents. The minimum number of records or documents that must contain a given pair of concepts for it to be considered as a co-occurrence; the lower you set this option, the easier it is to find co-occurrences. Increasing the value results in fewer, but more significant, co-occurrences. As an example, suppose that the concepts "apple" and "pear" are found together in 2 records (and that neither of the two concepts occurs in any other records). With Minimum number of documents. set to 2 (the default), the co-occurrence technique will create a category rule (apple and pear). If the value is raised to 3, the rule will no longer be created.

Note: With small datasets (< 1000 responses) you may not find any co-occurrences with the default settings. If so, try increasing the search distance value.

Note: You can prevent concepts from being grouped together by specifying them explicitly. See the topic Managing Link Exception Pairs for more information.