Business scenario: Filtering and organizing email

ExampleCo. Enterprises, a fictitious company, must classify and archive a backlog of unclassified email and set up a system to regularly classify all new email.

ExampleCo. Enterprises is a European company that recently acquired a company based in the United States. Bob, the IT administrator for the newly acquired company, is tasked with creating and maintaining the corporate email archive.

To control ongoing storage costs and potential legal discovery costs, Bob must manage the email archive to avoid adding irrelevant data. Additionally, because the two companies previously maintained separate email systems, Bob must archive content from Lotus® Domino® and Microsoft Exchange. Bob needs to ensure that the entire message content is archived, including attachments.

To satisfy all of these requirements, Bob selected IBM® Content Collector and IBM Content Classification to automatically and intelligently create and maintain the email archive. Bob needs to filter out irrelevant data, such as company bulletins, newsletters, and personal email that has no business value (for example, notes that discuss the outcome of a local team's sporting event).

Bob creates an IBM Content Classification decision plan to define a set of rules for automatically classifying email and assigning the correct category value for the item in IBM Content Manager, such as Contracts, Claims, or Human Resources. The rules need to filter out irrelevant email before it is archived and automatically detect sensitive information in an email, such as social security numbers and credit card information.

Bob creates an IBM Content Classification knowledge base and then trains the classification system by using a small set of user mailboxes to serve as a set of representative documents. After the system is trained, Bob builds the archive by classifying all email that was transmitted over the past year. The initial archive contains approximately 150 million emails.

After the initial archives are built, Bob configures a task route to regularly archive email in IBM Content Collector that uses IBM Content Classification to first classify the email. Currently, an additional 250,000 - 500,000 emails are expected to be processed per day. The size of individual messages might range from 20 KB to as large as 100 MB, depending on the length of the email and the number of documents that are attached.

Anne is a business analyst who is highly familiar with the company's information repositories, the enterprise taxonomy, and the data flow. Anne wants to use IBM Content Classification to review some documents to ensure that the correct classification decisions are applied. It is critical that Bob and Anne understand how IBM Content Classification filters email so that they can ensure that the right email is archived and that the email is assigned to the correct categories.