Skip to main content

alphaWorks  >  Forums  >  IBM Many Aspects Document Summarization Tool  >  developerWorks

Summarizer algorithms    Point your RSS reader here for a feed of the latest messages in this thread


     

 
 

My developerWorks
 Welcome, Guest
Sign in or register
Permlink Replies: 0 - Pages: 1 Threads: [ Previous | Next ]
dovid-halevi

Posts: 1
Registered: Aug 29, 2008 03:00:02 AM
Summarizer algorithms
Posted: Aug 29, 2008 03:13:29 AM
Click to report abuse...   Click to reply to this thread Reply
The instructions by a top research db firm to abstractors: Read the first paragraph and skim the first lines of following paragraphs. They consider this effetive technique for abstracting. If the article is properly written in a newspaper-article style, this may be correct. The first paragraph models the abstrac or summary. What follows are details. Other styles .... not so.

Neither ots (open text summarizer--I do not know which of the algorithms it uses) nor any of the manyfacets supported algorithms looks like the above. For better or for worse.

Additionally, the last of the algorithms in manyfacets changes with every play. Anyway, one must tweak to get the first paragraph hit at all.

Anyway, some wishlists:
1. Need a stdin->stdout cli tool (like ots). Such programs would tend to be used in a cgi (-like) chain.
2. API to use as plugin for Word, Abiword, OO?
3. Input filtering (pdf, rtf, html ....). Simply formatted image-pages can be chained in an -->ocr->ots--> chain.
4. For the GUI, would tend to use with drag&drop, clipboard, ocr? more often than reading files. Should not be hard to implement.
5. Opensource -- let other tinkerers participate!

Point your RSS reader here for a feed of the latest messages in all forums