Topic
2 replies Latest Post - ‏2012-12-19T13:00:30Z by a_totade
a_totade
a_totade
2 Posts
ACCEPTED ANSWER

Pinned topic Reading RTF file and coverting it into XML

‏2012-12-19T08:12:38Z |
I have a requirement to read RTF file and convert that data in a specific XML format.

Has any body tried before reading files in RTF format and what are the suggestions for it.

I have been only able to see information here. http://www.ogf.org/pipermail/dfdl-wg/2012-February/001725.html
Thanks in advance.
I have attached the sample input file which looks like below if opened in text file.

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}{\f1\fswiss\fcharset0 Arial;}}{\colortbl ;\red0\green0\blue0;}{\*\generator Msftedit 5.41.21.2506;}\viewkind4\uc1\pard\sb288\tqc\tx10124\fs24\tab\cf1\b INFO\fs29\par\pard\sb463\tx900\tx8160\tx9105\cf0\b0\fs24\tab\cf1\fs16 Some, Inc.\cf0\fs24\tab\cf1\b\fs16 PAGE\cf0\b0\fs24\tab\cf1\fs16 1\fs21\par\pard\tx900\cf0\fs24\tab\cf1\fs16 2510 N. some Ave.\fs18\par\cf0\fs24\tab\cf1\fs16 Indianapolis, IN 111\fs18\par\pard\tx900\tx7410\tx9105\cf0\fs24\tab\cf1\fs16 UNITED STATES\cf0\fs24\tab\cf1\b\fs16 INVOICE DATE\cf0\b0\fs24\tab\fs16 5\cf1 /18/2012\fs24\par\cf0\tab\cf1\fs16 (111)-111-1111\cf0\fs24\tab\cf1\b\fs16 INVOICE NO\cf0\b0\fs24\tab\cf1\fs16 3121114\fs21\par\pard\sb899\tx420\tx708\tx6036\tx6333\cf0\fs24\tab\cf1\b\fs16 S\cf0\b0\fs24\tab\cf1\fs16 CUST # 11111\cf0\fs24\tab\cf1\b\fs16 S\cf0\b0\fs24\tab\cf1\fs16 XYZ SYS\fs21\par\pard\tx420\tx708\tx6036\tx6333\cf0\fs24\tab\cf1\b\fs16 O\cf0\b0\fs24\tab\cf1\fs16 ABC SYS\cf0\fs24\tab\cf1\b\fs16 H\cf0\b0\fs24\tab\cf1\fs16 1111 W abc ST\fs18\par\cf0\fs24\tab\cf1\b\fs16 L\cf0\b0\fs24\tab\cf1\fs16 PO BOX 2222\cf0\fs24\tab\cf1\b\fs16 I\cf0\b0\fs24\tab\cf1\fs16 CHANDLER, AZ \fs18\par\cf0\fs24\tab\cf1\b\fs16 D\cf0\b0\fs24\tab\cf1\fs16 PHOENIX, AZ \cf0\fs24\tab\cf1\b\fs16 P\cf0\b0\fs24\tab\cf1\fs16 32222-2222\fs18\par\pard\tx420\tx708\tx6036\cf0\fs24\tab\cf1\b\fs16 TO\cf0\b0\fs24\tab\cf1\fs16 2222-2222\cf0\fs24\tab\cf1\b\fs16 TO\fs18\par\pard\sb526\tx7845\tqr\tx10287\cf0\b0\fs24\tab\cf1\b\fs16 TOTAL DUE\cf0\b0\fs24\tab\cf1\fs16 1,11.00\fs21\par\pard\tx7680\tx9000\cf0\fs24\tab\cf1\b\fs16 CUSTOMER PO:\cf0\b0\fs24\tab\cf1\fs16 22222-10-1101\fs21\par\pard\sb180\tx90\tx1216\tx2956\tx4576\tx6106\tx7576\b\fs16 DATE1\cf0\b0\fs24\tab\cf1\b\fs16 DATE2\cf0\b0\fs24\tab\cf1\b\fs16 NO\cf0\b0\fs24\tab\cf1\b\fs16 SOMEDATE\cf0\b0\fs24\tab\cf1\b\fs16 SEND DATE\cf0\b0\fs24\tab\cf1\b\fs16 SEND NO\fs21\par\pard\sb120\tx90\tx2956\tx4576\tx6106\tqr\tx9318\b0\fs16 10/18/2012\cf0\fs24\tab\cf1\fs16 11111\cf0\fs24\tab\cf1\fs16 9/23/2012\cf0\fs24\tab\cf1\fs16 9/23/2012\cf0\fs24\tab\cf1\fs16 11111\fs21\par\pard\sb134\tx90\tx2400\tx8760\tx10080\b\fs16 INFO\cf0\b0\fs24\tab\cf1\b\fs16 VIA\cf0\b0\fs24\tab\cf1\b\fs16 CRY\cf0\b0\fs24\tab\cf1\b\fs16 TITLE \fs21\par\pard\sb106\tx90\tx2400\tx8760\tx10080\b0\fs16 SOME INFO\cf0\fs24\tab\cf1\fs16 XZ\cf0\fs24\tab\cf1\fs16 jht\cf0\fs24\tab\cf1\fs16 plane\fs21\par\pard\sb74\tx90\tqr\tx3675\tqr\tx3765\tqr\tx6162\tqr\tx7761\tqr\tx9728\tqr\tx11445\b\fs16 ITEM ID\cf0\b0\fs24\tab\cf1\b\fs16 CCCC\cf0\b0\fs24\tab\cf1\b\fs16 UNITS\cf0\b0\fs24\tab\cf1\b\fs16 REQUEST\cf0\b0\fs24\tab\cf1\b\fs16 SENT\fs21\par\pard\sb119\tx90\tx3225\tx3722\tqr\tx6162\tqr\tx7761\tqr\tx9728\tqr\tx11445\b0\fs16 INFO\cf0\fs24\tab\tab\cf1\fs16 0\cf0\fs24\tab\cf1\fs16 CASE\cf0\fs24\tab\cf1\fs16 5.00\cf0\fs24\tab\cf1\fs16 5.00\fs21\par\pard\tx90\tx5505\fs16 INFO AGAIN\cf0\fs24\tab\cf1\fs21\par\pard\tx90\fs16 AINFO AGAIN\fs18\par\pard\fs16 INFO AGAIN\cf0\f1\fs20\par}

Let me know if there are any suggestions.
Thanks for reading my post.
Updated on 2012-12-19T13:00:30Z at 2012-12-19T13:00:30Z by a_totade
  • kimbert@uk.ibm.com
    kimbert@uk.ibm.com
    514 Posts
    ACCEPTED ANSWER

    Re: Reading RTF file and coverting it into XML

    ‏2012-12-19T11:11:40Z  in response to a_totade
    Message broker does not have a native parser for RTF.
    The MRM parser cannot deal with RTF because it is a recursive format.

    I suggest that you use Java to parse the RTF, and then create your output XML by building up a message tree under OutputRoot.XMLNSC. There are loads of Java class libraries for RTF parsing/conversion, and writing your own would not be too hard if licensing is a problem.
    • a_totade
      a_totade
      2 Posts
      ACCEPTED ANSWER

      Re: Reading RTF file and coverting it into XML

      ‏2012-12-19T13:00:30Z  in response to kimbert@uk.ibm.com
      I am expecting file to come in same format shown in sample.rtf.
      In that case I was planning to parse the file assuming following structure of RTF file and see if I can parse using MRM parser.

      {\rtf1

      { }
      { }
      \pard\ garbage {garbage data {}}

      \pard\ garbage {garbage data garbage data garbage data {}}
      \pard\ garbage {garbage data garbage data garbage data {}}

      }