Tips for Creating Categories
In order to help you create better categories, you can review some tips that can help you make decisions on your approach.
Tips on Category-to-Document Ratio
The categories into which the documents and records are assigned are not often mutually exclusive in qualitative text analysis for at least two reasons:
- First, a general rule of thumb says that the longer the text document or record, the more distinct the ideas and opinions expressed. Thus, the chances that a document or record can be assigned multiple categories is greatly increased.
- Second, often there are various ways to group and interpret text documents or records that are not logically separate. In the case of a survey with an open-ended question about the respondent's political beliefs, we could create categories, such as Liberal and Conservative, or Republican and Democrat, as well as more specific categories, such as Socially Liberal, Fiscally Conservative, and so forth. These categories do not have to be mutually exclusive and exhaustive.
Tips on Number of Categories to Create
Category creation should flow directly from the data—as you see something interesting with respect to your data, you can create a category to represent that information. In general, there is no recommended upper limit on the number of categories that you create. However, it is certainly possible to create too many categories to be manageable. Two principles apply:
- Category frequency. For a category to be useful, it has to contain a minimum number of documents or records. One or two documents may include something quite intriguing, but if they are one or two out of 1,000 documents , the information they contain may not be frequent enough in the population to be practically useful.
- Complexity. The more categories you create, the more information you have to review and summarize after completing the analysis. However, too many categories, while adding complexity, may not add useful detail.
Unfortunately, there are no rules for determining how many categories are too many or for determining the minimum number of records per category. You will have to make such determinations based on the demands of your particular situation.
We can, however, offer advice about where to start. Although the number of categories should not be excessive, in the early stages of the analysis it is better to have too many rather than too few categories. It is easier to group categories that are relatively similar than to split off cases into new categories, so a strategy of working from more to fewer categories is usually the best practice. Given the iterative nature of text mining and the ease with which it can be accomplished with this software program, building more categories is acceptable at the start.