Tips for Creating Categories

In order to help you create better categories, you can review some tips that can help you make decisions on your approach.

Tips on Category-to-Response Ratio

When codes are created for a closed-end question, such as "When was the last time you visited our outlet store?", the categories into which the responses fall should be mutually exclusive and exhaustive. That principle does not necessarily apply to qualitative text analysis, for at least two reasons:

• First, a general rule of thumb says that the longer the text record, the more distinct the ideas and opinions expressed. Thus, the chances that a record can be assigned multiple categories is greatly increased.

• Second, often there are various ways to group and interpret text records that are not logically separate. In the case of a survey with an open-ended question about the respondent's political beliefs, we could create categories, such as Liberal and Conservative, or Republican and Democrat, as well as more specific categories, such as Socially Liberal, Fiscally Conservative, and so forth. These categories do not have to be mutually exclusive and exhaustive.

Tips on Number of Categories to Create

Except in the case of an extremely simple open-ended question, it is never intuitively obvious how many categories to create. The number of categories is not really an issue of concern. Instead, category creation should flow directly from the data—as you see something interesting with respect to the objectives of this survey, you can create a category to represent those attitudes and ideas.

• Category frequency. For a category to be useful, it has to contain a minimum number of records. One or two records may include something quite intriguing, but if they are one or two out of 1,000 records, the information they contain may not be frequent enough in the population to be practically useful.

• Complexity. The more categories you create, the more information you have to review and summarize after completing the analysis. However, too many categories, while adding complexity, may not add useful detail.

Unfortunately, there are no rules for determining how many categories are too many or for determining the minimum number of records per category. You will have to make such determinations based on the demands of your particular situation.

We can, however, offer advice about where to start. Although the number of categories should not be excessive, in the early stages of the analysis it is better to have too many rather than too few categories. It is easier to group categories that are relatively similar than to split off cases into new categories, so a strategy of working from more to fewer categories is usually the best practice. Given the iterative nature of text mining and the ease with which it can be accomplished with this software program, building more categories is acceptable at the start.