Topic
  • 3 replies
  • Latest Post - ‏2012-01-17T13:21:32Z by reddz
reddz
reddz
23 Posts

Pinned topic Unescapped HTML Characters In the Description of Search Results

‏2011-12-22T15:08:10Z |
Hi All,
While Crawling RSS feeds With Web Crawler, Search Results have Unescapped HTML
(<script>) Characters although the Actual RSS feeds don't have this HTML Characters.

I tried to Unescape the HTML Characters from the plug in but in vain.
Also Tried to Adding Rules to HTML Parser
https://www-304.ibm.com/support/docview.wss?q1=htmlparser&rs=63&uid=swg27011251&context=SS5SQ7&cs=utf-8&lang=en&loc=en_US
As parser Used is HTML Parser.

Didnt yeild any results.

Can you please guide me how Proceed further with the issue .

Omnifind Enterprise Edition 9.1 is the Omnifind Server.
Updated on 2012-01-17T13:21:32Z at 2012-01-17T13:21:32Z by reddz
  • bfoyle
    bfoyle
    29 Posts

    Re: Unescapped HTML Characters In the Description of Search Results

    ‏2012-01-09T22:59:09Z  
    You might be able to use field filters to change or remove these character sequences. I have used this feature in the past to change things like &amp to &, etc.
  • bfoyle
    bfoyle
    29 Posts

    Re: Unescapped HTML Characters In the Description of Search Results

    ‏2012-01-10T16:14:02Z  
    • bfoyle
    • ‏2012-01-09T22:59:09Z
    You might be able to use field filters to change or remove these character sequences. I have used this feature in the past to change things like &amp to &, etc.
    I have been informed that OEE doesn't have the field filtering that was introduced in ICA. So you may have to move to an ICA search collection if you want to use the field filtering. If that's not possible, I think you are looking at needing to write a crawler plugin to find those escaped characters.
  • reddz
    reddz
    23 Posts

    Re: Unescapped HTML Characters In the Description of Search Results

    ‏2012-01-17T13:21:32Z  
    • bfoyle
    • ‏2012-01-10T16:14:02Z
    I have been informed that OEE doesn't have the field filtering that was introduced in ICA. So you may have to move to an ICA search collection if you want to use the field filtering. If that's not possible, I think you are looking at needing to write a crawler plugin to find those escaped characters.
    Hi Bfoyle,

    Thanks for Suggestion. We have handled through the custom Plug In

    Regards
    Reddz