Topic
  • 6 replies
  • Latest Post - ‏2014-04-24T13:53:01Z by llandale
Estebell
Estebell
61 Posts

Pinned topic Regexp with new lines

‏2014-04-22T12:38:44Z |

Hi,

I want to parse an object text to put each match in a new attribute.

But, when there is some new lines, my function doesn't work. Any help ?

 

/*My current object is litterally :
"Version 2 :
Add some links
with "Satisfy" link module" 

I want to have :
"Add some links
with "Satisfy" link module"
in match [3] 
*/

Object curro = current Object
string Text_Version = curro."Object Text"
string Text_Description

Regexp MyText = regexp2 ".*(Version [0-9]*:) [^\\n*](\\\\~| *)*(.*)"
if (MyText Text_Version){
Text_Description = Text_Version [match 3]
}
else Text_Description = "Nothing"

print Text_Version [match 0] "\n" 
print Text_Version [match 1] "\n"
print Text_Version [match 2] "\n"
print Text_Version [match 3] "\n"

/*
match[0] = "ersion 2 :
Add some links
with "Satisfy" link module"

match[1] = "ersion 2 :
Add some links
with "Satisfy" link module"

match [2] = ""

match[3] = ""
*/

What is wrong ? 

Why the V of Verision is not parsed ?

Updated on 2014-04-22T12:40:09Z at 2014-04-22T12:40:09Z by Estebell
  • Mathias Mamsch
    Mathias Mamsch
    2194 Posts
    ACCEPTED ANSWER

    Re: Regexp with new lines

    ‏2014-04-23T21:14:02Z  
    • Estebell
    • ‏2014-04-23T07:26:18Z

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    // object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)
    
    Object curro = current Object
    string Text_Version = curro."Object Text"
    const string str_anychar = "["charOf(1)"-"charOf(255)"]"
    
    Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"
    
    if (MyText Text_Version)
    {
        print Text_Version [match(0)] "\n"
        print Text_Version [match(1)] "\n"
        print Text_Version [match(2)] "\n"
    }
    
    // match [0] = "Version 2 :"
    // match [1] = "Version 2 :"
    // match [2] = ""
    

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

    Your line 7 is wrong. It should read:

    Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"
    

    Regards, Mathias

     

  • Mathias Mamsch
    Mathias Mamsch
    2194 Posts

    Re: Regexp with new lines

    ‏2014-04-22T14:29:29Z  

    I don't quite get your regular expression but in DXL you can match newlines in different ways, however the "." placeholder does not match on newlines:

    string sText = "\n"; 
    
    Regexp re1 = regexp "\\\n"  // match
    Regexp re2 = regexp "\\n"   // match
    Regexp re3 = regexp "\n"    // match
    Regexp re4 = regexp "."     // no match
    
    if (re1 sText) print "Match re1\n"; 
    if (re2 sText) print "Match re2\n"; 
    if (re3 sText) print "Match re3\n"; 
    if (re4 sText) print "Match re4\n";
    

    Although I don't know what the (\\\\~| *) part of your regexp is supposed to match to, but the following test at least matches in group 1 and 3 (note that you have a space between 'Version 2' and the ':' ...

    string Text_Version = "Version 2 :
    Add some links
    with \"Satisfy\" link module"
    
    Regexp MyText = regexp2 "(Version [0-9]*[ ]*:)[ \n]*(\\\\~| *)*(.*)"
    
    print "Match 0: " Text_Version [match 0] "\n" 
    print "Match 1: " Text_Version [match 1] "\n"
    print "Match 2: " Text_Version [match 2] "\n"
    print "Match 3: " Text_Version [match 3] "\n"
    

    Regards, Mathias

  • llandale
    llandale
    3035 Posts

    Re: Regexp with new lines

    ‏2014-04-22T17:04:16Z  

    Not really following; but I will say:

    • (.*)   will match any number of characters no including a new-line (EOL)
    • Other RegExp references to "end of string" generally mean "end of string or next EOL".

    Thus, parsing text with EOLs causes problems.  I resolve that with this

    • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
    • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

    That seesm to handle EOLs in the text

    -Louie

  • Estebell
    Estebell
    61 Posts

    Re: Regexp with new lines

    ‏2014-04-23T07:26:18Z  
    • llandale
    • ‏2014-04-22T17:04:16Z

    Not really following; but I will say:

    • (.*)   will match any number of characters no including a new-line (EOL)
    • Other RegExp references to "end of string" generally mean "end of string or next EOL".

    Thus, parsing text with EOLs causes problems.  I resolve that with this

    • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
    • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

    That seesm to handle EOLs in the text

    -Louie

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    // object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)
    Object curro = current Object
    string Text_Version = curro."Object Text"
    const string str_anychar = "["charOf(1)"-"charOf(255)"]"
    
    Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"
    
    if (MyText Text_Version)
    {
        print Text_Version [match(0)] "\n"
        print Text_Version [match(1)] "\n"
        print Text_Version [match(2)] "\n"
    }
    
    // match [0] = "Version 2 :"
    // match [1] = "Version 2 :"
    // match [2] = ""
    

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

  • Mathias Mamsch
    Mathias Mamsch
    2194 Posts

    Re: Regexp with new lines

    ‏2014-04-23T21:14:02Z  
    • Estebell
    • ‏2014-04-23T07:26:18Z

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    <pre class="html dw" data-editor-lang="js" data-pbcklang="html" dir="ltr">// object text : "Version 2 : Add some links with link module" (without any EOL nor special characters) Object curro = current Object string Text_Version = curro."Object Text" const string str_anychar = "["charOf(1)"-"charOf(255)"]" Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*" if (MyText Text_Version) { print Text_Version [match(0)] "\n" print Text_Version [match(1)] "\n" print Text_Version [match(2)] "\n" } // match [0] = "Version 2 :" // match [1] = "Version 2 :" // match [2] = "" </pre>

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

    Your line 7 is wrong. It should read:

    Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"
    

    Regards, Mathias

     

  • Estebell
    Estebell
    61 Posts

    Re: Regexp with new lines

    ‏2014-04-24T07:10:53Z  

    Your line 7 is wrong. It should read:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*" </pre>

    Regards, Mathias

     

    Thank's so much !

    It works fine !!!

     

  • llandale
    llandale
    3035 Posts

    Re: Regexp with new lines

    ‏2014-04-24T13:53:01Z  

    Your line 7 is wrong. It should read:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*" </pre>

    Regards, Mathias

     

    Beat my head against the wall yesterday and missed that.  Doh!

    However, I think you should move that last asterisk * inside the parens; this gives "match 2" the entire rest of the string.  The way you have it, match 2 is just the last character, in this case "e".

    const string str_anychar = "["charOf(1)"-"charOf(255)"]"

    Regexp MyText = regexp2 "^([A-Za-z0-9 ]*:)(" str_anychar "*)"

    void Test(string in_String)
    {
     print "[" in_String "]\n"
     if (MyText in_String)
     {
         print "\t0  [" in_String [match(0)] "]\n"
         print "\t1  [" in_String [match(1)] "]\n"
         print "\t2  [" in_String [match(2)] "]\n"
         print "\t3  [" in_String [match(3)] "]\n"
     }
     else print "\tNo Match\n"
    }
    Test("Version 2 : Add some links with link module")
    Test("Version 2 : Add some: links with link module")
    Test("Version 2 : Add some: links \nwith link module")

    -Louie

    Updated on 2014-04-24T13:53:36Z at 2014-04-24T13:53:36Z by llandale