|
Summarizer algorithms
|
|
Replies:
0
-
Pages:
1
|
Threads:
[
Previous
|
Next
]
|
|
Posts:
1
Registered:
Aug 29, 2008 03:00:02 AM
|
|
|
|
Summarizer algorithms
Posted:
Aug 29, 2008 03:13:29 AM
|
|
|
|
The instructions by a top research db firm to abstractors: Read the first paragraph and skim the first lines of following paragraphs. They consider this effetive technique for abstracting. If the article is properly written in a newspaper-article style, this may be correct. The first paragraph models the abstrac or summary. What follows are details. Other styles .... not so.
Neither ots (open text summarizer--I do not know which of the algorithms it uses) nor any of the manyfacets supported algorithms looks like the above. For better or for worse.
Additionally, the last of the algorithms in manyfacets changes with every play. Anyway, one must tweak to get the first paragraph hit at all.
Anyway, some wishlists:
1. Need a stdin->stdout cli tool (like ots). Such programs would tend to be used in a cgi (-like) chain.
2. API to use as plugin for Word, Abiword, OO?
3. Input filtering (pdf, rtf, html ....). Simply formatted image-pages can be chained in an -->ocr->ots--> chain.
4. For the GUI, would tend to use with drag&drop, clipboard, ocr? more often than reading files. Should not be hard to implement.
5. Opensource -- let other tinkerers participate!
|
|
|
|
|