After taking the first steps, we have fine tuned some of the annotators and added synonyms to our CRM data before importing them into a companies dictionary.
Case 1: Are my competitors doing business with my clients or prospects ?
Case 2: Are my clients or prospects mentioned to be using "our" products ?
These two cases require knowledge of :
Who are my competitors (case 1)
Who are my clients or prospects (case 1 and 2)
What are "our" products (case 2)
We don't want just any match, we are looking for strategic business terms, so we require some jargon.
In Watson Explorer this means we need to have dictionaries, so here they are, the jargon is captured in the DWSVocabulair dictionary and the products are expanded in the screenshot below:
As you can see, these are not actually our products, these products of IBM and Microsoft (and others) !
These 6k+ companies (see screenshot above) are coming from our CRM system and they are categorized by type (client, partner, competitors).
Before importing the CSV data into studio, we needed to add synonyms.
The reason to add synonyms is that company are seldom referred to with their legal or formal name. I got some help from the community by discussing this issue in this forum: https://www.ibm.com/developerworks/community/forums/html/topic?id=9e74425e-2cf6-40b9-9f6a-f79c7a8595e4&ps=25
The adding of synonyms is not done manually, we created a script that would:
1. Remove any term that refers to a legal entity, e.g. Ltd. or B.V. or N.V.
2. Concatenate separate words in different casing
Now that we have the dictionaries, we can create parsing rules that use those dictionaries.
Below is the annotator that determines if a competitor is mentioned along with a client and our strategic "jargon" and next to it the annotator that matches a client and a product.
You can learn a little Dutch from the screenshot (Klant==client, Concurrent==Competitor) ;-)
This all combined, results in a so called PEAR that we deployed on our Analytics Server.
There we have a collection that crawls several websites (both Dutch and English).
First we take a look at how many documents have been found for each type of company (BedrijfsType)
These are quiet a lot, and next comes the power of annotators, below Customer and Product, but similar results can be found for "competitor and customer" (Concurrent bij Klant)
- Include the BoardReader from http://www.socialgist.com/ , so that we can crawl more than the websites we have today.
- Differentiate between "our" products and products from the competition. Similar to companies, we need to introduce a "producttype".