Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
3 replies Latest Post - ‏2012-10-17T04:16:04Z by SystemAdmin
erik.kurosawa
erik.kurosawa
1 Post
ACCEPTED ANSWER

Pinned topic Handling synonyms in Content Analytics Miner Analysis

‏2012-10-12T16:43:48Z |
We are currently devising a solution for a customer that centers around sentiment analysis for specific entities (e.g. a specific company).
The customer wants to be able to do time-series analysis that allows him to compare the sentiment of two different companies over specific time-frames.

One important aspect is hereby the handling of synonyms. If one of the companies is IBM, then mentions of IBM could occur in the form "IBM", "I.B.M", "International Business Machines".
How can we make sure that during the aggregation in the analytics miner, all occurences are mapped to one label (e.g. IBM)?

Where would this best be addressed ? At the NER stage or directly during analysis ?

Thanks a lot
Updated on 2012-10-17T04:16:04Z at 2012-10-17T04:16:04Z by SystemAdmin
  • bfoyle
    bfoyle
    60 Posts
    ACCEPTED ANSWER

    Re: Handling synonyms in Content Analytics Miner Analysis

    ‏2012-10-13T00:07:00Z  in response to erik.kurosawa
    NER does not have built in normalization so you will end up with entries for sentiment that have IBM, ibm, I.B.M., International Business Machines, International Business Machines, Corp. etc. I would recommend that you do this with custom dictionaries in ICA Studio for the companies in question and build them out with all the variations of the company names and spellings. You can then convert to a rule for lower / upper case normalization. Then you map that to a facet in ICA for your collection using the normalized company name value.

    bf
    • SystemAdmin
      SystemAdmin
      197 Posts
      ACCEPTED ANSWER

      Re: Handling synonyms in Content Analytics Miner Analysis

      ‏2012-10-15T10:40:44Z  in response to bfoyle
      I agree with Bob to crate a custom dictionary where each entry will have the specific lemma you want to be returned (example: Lemma=IBM, Surface forms= I.B.M.,Big Blue, etc..). However for the text miner, when you create a rule to find the names of the companies, you create a feature from the lemma of the company, and when you export the pear file (search engine), you can specify Index Field or Facet, you can select the lemma to be "The value in the type to be assigned".
      Please see attached screenshot.
  • SystemAdmin
    SystemAdmin
    197 Posts
    ACCEPTED ANSWER

    Re: Handling synonyms in Content Analytics Miner Analysis

    ‏2012-10-17T04:16:04Z  in response to erik.kurosawa
    ICA user dictionary can be a simpler option if all words in question are noun. ICA user dictionary offers "equivalent terms" capability to merge various words into one facet value. All words in user dictionaries are recognized as noun.

    Reference in Information Center:
    http://pic.dhe.ibm.com/infocenter/analytic/v3r0m0/topic/com.ibm.discovery.es.ad.doc/iiysatauserdict.htm