Topic
3 replies Latest Post - ‏2013-01-15T08:32:18Z by SystemAdmin
OurGuest
OurGuest
163 Posts
ACCEPTED ANSWER

Pinned topic Detecting List

‏2012-12-26T12:51:29Z |
Different user, use different methods to make a list. Some uses doors bullets, some use tab with ascii bullets etc.

This inconsistency makes absolute detection of a list a challenge. Has anyone developed a dxl that has a high probability of detecting the various forms of list that they are willing to post?
Updated on 2013-01-15T08:32:18Z at 2013-01-15T08:32:18Z by SystemAdmin
  • llandale
    llandale
    2939 Posts
    ACCEPTED ANSWER

    Re: Detecting List

    ‏2013-01-07T15:01:19Z  in response to OurGuest
    No, but I recently had a similar issue trying to find headings in Object Text. I wonder if the following is useful. Looking at raw text...
    • is there more than 2 EOLs
    • is there a TAB in the first few characters
    • is the first few characters dominated with white-space
    • does a RichTextParagraph have characteristic "bullet"

    -Louie
  • Mathias Mamsch
    Mathias Mamsch
    1910 Posts
    ACCEPTED ANSWER

    Re: Detecting List

    ‏2013-01-11T10:59:55Z  in response to OurGuest
    I wrote an RTF parser once to handle a complex bullet point manipulation task. At the moment this one handles fi, li and pard tags to track the indentation of the paragraph, which is probably an important point for detecting lists. You can use the code with a loop that handles certains states, e.g. I am at the begin of a paragraph, etc. I always wanted to put a high level interface (flow document) over it, but I never came around to do so.

    Would you be interested in having that RTF parser open source? Still looking for people who would be willing to commit to an open source DXL library. Sorry, that the library is not well commented, but I guess you will get the functionality pretty fast. You can use the nextToken() function in a loop to parse the RTF and then roam around in the tokens by using getToken(RTFParser, int) to look forward, backwards. The RTFParser also tracks the group name, which you can use to determine the location in the rtf, e.g. "rtf:fnttble:fnt" or something like this tells you, that you are in the font table of the rtf.

    I am not sure this will help, but I am pretty sure, for reliably detecting lists you will need to parse RTF. Additionally you probabably need to define the criteria that you would use for detecting lists, e.g. more than one space or a tab or a line indent followed by a bullet symbol or a number or and all of that at least twice?

    Regards, Mathias

    Mathias Mamsch, IT-QBase GmbH, Consultant for Requirement Engineering and D00RS
    • SystemAdmin
      SystemAdmin
      3180 Posts
      ACCEPTED ANSWER

      Re: Detecting List

      ‏2013-01-15T08:32:18Z  in response to Mathias Mamsch
      Hello Mathias

      > I wrote an RTF parser once to handle a complex bullet point manipulation task.

      May be that the project http://nrtftree.sourceforge.net/ is a good template for this.

      Best regards
      Wolfgang