Topic
  • 2 replies
  • Latest Post - ‏2013-01-22T17:54:32Z by llandale
umeshaKrishnegowda
umeshaKrishnegowda
3 Posts

Pinned topic want to extract particular values from a text file using dxl regExp

‏2013-01-21T09:48:37Z |
filename.txt
some content xyz
<PASSEDATTRIBUTES>
Name
umesh
manju
ramesh
</PASSEDATTRIBUTES>
some content xyz
I want to extract bold data enclosed b/w <PASSEDATTRIBUTES> and </PASSEDATTRIBUTES> in dxl.
can any one help me to extract it? using Regular expression. i have done it without Regexp.

Thanks in advance
Updated on 2013-01-22T17:54:32Z at 2013-01-22T17:54:32Z by llandale
  • umeshaKrishnegowda
    umeshaKrishnegowda
    3 Posts

    Re: want to extract particular values from a text file using dxl regExp

    ‏2013-01-21T11:00:01Z  
    /*********************************************************/
    Regexp regexpInitialMatch = regexp2 "<PASSEDATTRIBUTES>"
    Regexp regexpEndMatch = regexp2 "</PASSEDATTRIBUTES>"
    Regexp regxpName = regexp2 "Name"
    string sEachline = null
    string sCollectAttri = null
    string sdsfFile = "C:\\Documents and Settings\\user\\Desktop\\Example.txt"
    Stream streamIn = read sdsfFile
    Buffer bufTempStoreAttri = create
    string ExtractPassbackAttr(string sdsfFile )
    {
    while (true)
    { streamIn >>sEachline
    if(end streamIn)
    {
    break
    }
    elseif(regexpInitialMatch sEachline )
    {
    streamIn >>sEachline
    while(!regexpEndMatch sEachline)
    {
    if (!regxpName sEachline)
    {
    bufTempStoreAttri = bufTempStoreAttri""sEachline"\n"
    }
    streamIn >>sEachline
    }
    }
    }
    return stringOf(bufTempStoreAttri)
    close (streamIn)
    }
    sCollectAttri = ExtractPassbackAttr(sdsfFile)
    if (null sCollectAttri )
    {
    print "No Attributes Passed back from Supplier"
    }
    else
    {
    print sCollectAttri
    }
    delete bufTempStoreAttri

    /*********************************************************/

    Is there any other way to optimize this pgm?
  • llandale
    llandale
    2972 Posts

    Re: want to extract particular values from a text file using dxl regExp

    ‏2013-01-22T17:54:32Z  
    /*********************************************************/
    Regexp regexpInitialMatch = regexp2 "<PASSEDATTRIBUTES>"
    Regexp regexpEndMatch = regexp2 "</PASSEDATTRIBUTES>"
    Regexp regxpName = regexp2 "Name"
    string sEachline = null
    string sCollectAttri = null
    string sdsfFile = "C:\\Documents and Settings\\user\\Desktop\\Example.txt"
    Stream streamIn = read sdsfFile
    Buffer bufTempStoreAttri = create
    string ExtractPassbackAttr(string sdsfFile )
    {
    while (true)
    { streamIn >>sEachline
    if(end streamIn)
    {
    break
    }
    elseif(regexpInitialMatch sEachline )
    {
    streamIn >>sEachline
    while(!regexpEndMatch sEachline)
    {
    if (!regxpName sEachline)
    {
    bufTempStoreAttri = bufTempStoreAttri""sEachline"\n"
    }
    streamIn >>sEachline
    }
    }
    }
    return stringOf(bufTempStoreAttri)
    close (streamIn)
    }
    sCollectAttri = ExtractPassbackAttr(sdsfFile)
    if (null sCollectAttri )
    {
    print "No Attributes Passed back from Supplier"
    }
    else
    {
    print sCollectAttri
    }
    delete bufTempStoreAttri

    /*********************************************************/

    Is there any other way to optimize this pgm?
    Little comments:
    • Seems like you should put the Buffer create/delete inside your main function.
    • When you string append the Buffer you are defeating the purpose of the buffer. You append to a buffer like this:
      • buf += sEachLine
      • buf += "\n"

    I'm guessing that you will NOT do any better performance wise with some single grand Regexp that would look something like this:
    • Regexp re = regexp2("<PASSEDATTRIBUTES>\\nName(.*)\\n(.*)\\n(.*)\\n</PASSEDATTRIBUTES>"
    • if (re sFileContents){
      • sFirstName = sFileContents[match 1] // the first "(.*)"
      • sMiddleName = sFileContents[match 2] // the secnd "(.*)"
      • sLastName = sFileContents[match 3] // the third "(.*)"
    • }

    I'm sure the above re is incorrect; and I am NOT sure you can realistically program a single one; due to the EOLs embedded in the string. In any case, the 3 "(.*)" will probably need to be replaces with some code that means "any ascii character except EOL.

    -Louie