Topic
  • 9 replies
  • Latest Post - ‏2017-04-03T08:28:41Z by Estebell
Estebell
Estebell
84 Posts

Pinned topic Regexp with new lines

‏2014-04-22T12:38:44Z |

Hi,

I want to parse an object text to put each match in a new attribute.

But, when there is some new lines, my function doesn't work. Any help ?

 

/*My current object is litterally :
"Version 2 :
Add some links
with "Satisfy" link module" 

I want to have :
"Add some links
with "Satisfy" link module"
in match [3] 
*/

Object curro = current Object
string Text_Version = curro."Object Text"
string Text_Description

Regexp MyText = regexp2 ".*(Version [0-9]*:) [^\\n*](\\\\~| *)*(.*)"
if (MyText Text_Version){
Text_Description = Text_Version [match 3]
}
else Text_Description = "Nothing"

print Text_Version [match 0] "\n" 
print Text_Version [match 1] "\n"
print Text_Version [match 2] "\n"
print Text_Version [match 3] "\n"

/*
match[0] = "ersion 2 :
Add some links
with "Satisfy" link module"

match[1] = "ersion 2 :
Add some links
with "Satisfy" link module"

match [2] = ""

match[3] = ""
*/

What is wrong ? 

Why the V of Verision is not parsed ?

Updated on 2014-04-22T12:40:09Z at 2014-04-22T12:40:09Z by Estebell
  • Mathias Mamsch
    Mathias Mamsch
    2542 Posts
    ACCEPTED ANSWER

    Re: Regexp with new lines

    ‏2014-04-23T21:14:02Z  
    • Estebell
    • ‏2014-04-23T07:26:18Z

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    // object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)
    
    Object curro = current Object
    string Text_Version = curro."Object Text"
    const string str_anychar = "["charOf(1)"-"charOf(255)"]"
    
    Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"
    
    if (MyText Text_Version)
    {
        print Text_Version [match(0)] "\n"
        print Text_Version [match(1)] "\n"
        print Text_Version [match(2)] "\n"
    }
    
    // match [0] = "Version 2 :"
    // match [1] = "Version 2 :"
    // match [2] = ""
    

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

    Your line 7 is wrong. It should read:

    Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"
    

    Regards, Mathias

     

  • Mathias Mamsch
    Mathias Mamsch
    2542 Posts
    ACCEPTED ANSWER

    Re: Regexp with new lines

    ‏2017-03-31T15:30:37Z  
    • Estebell
    • ‏2017-03-31T13:35:55Z

    Hello Louie,

    I continue in this topic because I have a new problem.

    When a string has special characters, my regexp is not working. Here is my code and I don't know why it does not work.

    It prints "Objectif :\nVérifier que la sortie est capable d'effectuer 5000 man"  although intOf "œ"  = 156  that is between 1 and 255 ...

    void Test (string s)
    {
        string anychar = "["charOf(1) "-" charOf(255)"]"
            Regexp Text = regexp2 "(Objectif[ ]*:[ ]*\\n)("anychar"*)"
            if(Text s)
            {
                    s = s[match 2] ""
            }
            print s""
    }
            
            Test ("Objectif :\nVérifier que la sortie est capable d'effectuer 50000 manœuvres.")
    

     

    DOORS is internally using UTF8, which has a much wider set of characters. The char type can hold much more than 255 different characters, but IBM never bothered to correct the "charOf", "intOf" functions. 

     

    So while your test suggests that intOf "œ"  = 156, it really 339 (see http://www.fileformat.info/info/unicode/char/0153/index.htm) and therefore does not match the range 0-255 (which regexp fully respects). 

    char c = addr_ 339
    print c
    
    c = charOf 339
    print c
    

    So the easiest way for your specific regexp, is to remove the "anychar" part from the regex and simply take a substring after the "end 0" match. If you really need anychar, you need to resolve to something like 

    "[^" + (charOf 1) + "]"
    

    (which assumes that you do not consider chr(1) as a valid character in your text. Hope this helps, Regards, Mathias

     

  • Mathias Mamsch
    Mathias Mamsch
    2542 Posts

    Re: Regexp with new lines

    ‏2014-04-22T14:29:29Z  

    I don't quite get your regular expression but in DXL you can match newlines in different ways, however the "." placeholder does not match on newlines:

    string sText = "\n"; 
    
    Regexp re1 = regexp "\\\n"  // match
    Regexp re2 = regexp "\\n"   // match
    Regexp re3 = regexp "\n"    // match
    Regexp re4 = regexp "."     // no match
    
    if (re1 sText) print "Match re1\n"; 
    if (re2 sText) print "Match re2\n"; 
    if (re3 sText) print "Match re3\n"; 
    if (re4 sText) print "Match re4\n";
    

    Although I don't know what the (\\\\~| *) part of your regexp is supposed to match to, but the following test at least matches in group 1 and 3 (note that you have a space between 'Version 2' and the ':' ...

    string Text_Version = "Version 2 :
    Add some links
    with \"Satisfy\" link module"
    
    Regexp MyText = regexp2 "(Version [0-9]*[ ]*:)[ \n]*(\\\\~| *)*(.*)"
    
    print "Match 0: " Text_Version [match 0] "\n" 
    print "Match 1: " Text_Version [match 1] "\n"
    print "Match 2: " Text_Version [match 2] "\n"
    print "Match 3: " Text_Version [match 3] "\n"
    

    Regards, Mathias

  • llandale
    llandale
    3035 Posts

    Re: Regexp with new lines

    ‏2014-04-22T17:04:16Z  

    Not really following; but I will say:

    • (.*)   will match any number of characters no including a new-line (EOL)
    • Other RegExp references to "end of string" generally mean "end of string or next EOL".

    Thus, parsing text with EOLs causes problems.  I resolve that with this

    • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
    • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

    That seesm to handle EOLs in the text

    -Louie

  • Estebell
    Estebell
    84 Posts

    Re: Regexp with new lines

    ‏2014-04-23T07:26:18Z  
    • llandale
    • ‏2014-04-22T17:04:16Z

    Not really following; but I will say:

    • (.*)   will match any number of characters no including a new-line (EOL)
    • Other RegExp references to "end of string" generally mean "end of string or next EOL".

    Thus, parsing text with EOLs causes problems.  I resolve that with this

    • const string cl_re_strAnyChar = "[" charOf(1) "-" charOf(255)"]"    // any character except null
    • Regexp re = regexp2(whatever (cl_re_strAnyChar)* whatever)

    That seesm to handle EOLs in the text

    -Louie

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    // object text : "Version 2 : Add some links with link module" (without any EOL nor special characters)
    Object curro = current Object
    string Text_Version = curro."Object Text"
    const string str_anychar = "["charOf(1)"-"charOf(255)"]"
    
    Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*"
    
    if (MyText Text_Version)
    {
        print Text_Version [match(0)] "\n"
        print Text_Version [match(1)] "\n"
        print Text_Version [match(2)] "\n"
    }
    
    // match [0] = "Version 2 :"
    // match [1] = "Version 2 :"
    // match [2] = ""
    

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

  • Mathias Mamsch
    Mathias Mamsch
    2542 Posts

    Re: Regexp with new lines

    ‏2014-04-23T21:14:02Z  
    • Estebell
    • ‏2014-04-23T07:26:18Z

    Well, I tried your const string cl_re_strAnyChar but it doesn't work...

    I've simplfied my object text.

    <pre class="html dw" data-editor-lang="js" data-pbcklang="html" dir="ltr">// object text : "Version 2 : Add some links with link module" (without any EOL nor special characters) Object curro = current Object string Text_Version = curro."Object Text" const string str_anychar = "["charOf(1)"-"charOf(255)"]" Regexp MyText = regexp2 "([A-Z a-z 0-9]*:)(str_anychar)*" if (MyText Text_Version) { print Text_Version [match(0)] "\n" print Text_Version [match(1)] "\n" print Text_Version [match(2)] "\n" } // match [0] = "Version 2 :" // match [1] = "Version 2 :" // match [2] = "" </pre>

    Why match (0) and then match(2) are wrong ??

    I expected match(0) = "Version 2 : Add some links with link module"  and match(2) = "Add some links with link module" Even without EOL and with the const string, the regexp does not match !!!

    Your line 7 is wrong. It should read:

    Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*"
    

    Regards, Mathias

     

  • Estebell
    Estebell
    84 Posts

    Re: Regexp with new lines

    ‏2014-04-24T07:10:53Z  

    Your line 7 is wrong. It should read:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*" </pre>

    Regards, Mathias

     

    Thank's so much !

    It works fine !!!

     

  • llandale
    llandale
    3035 Posts

    Re: Regexp with new lines

    ‏2014-04-24T13:53:01Z  

    Your line 7 is wrong. It should read:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">Regexp MyText = regexp2 "([A-Za-z0-9 ]*:)(" str_anychar ")*" </pre>

    Regards, Mathias

     

    Beat my head against the wall yesterday and missed that.  Doh!

    However, I think you should move that last asterisk * inside the parens; this gives "match 2" the entire rest of the string.  The way you have it, match 2 is just the last character, in this case "e".

    const string str_anychar = "["charOf(1)"-"charOf(255)"]"

    Regexp MyText = regexp2 "^([A-Za-z0-9 ]*:)(" str_anychar "*)"

    void Test(string in_String)
    {
     print "[" in_String "]\n"
     if (MyText in_String)
     {
         print "\t0  [" in_String [match(0)] "]\n"
         print "\t1  [" in_String [match(1)] "]\n"
         print "\t2  [" in_String [match(2)] "]\n"
         print "\t3  [" in_String [match(3)] "]\n"
     }
     else print "\tNo Match\n"
    }
    Test("Version 2 : Add some links with link module")
    Test("Version 2 : Add some: links with link module")
    Test("Version 2 : Add some: links \nwith link module")

    -Louie

    Updated on 2014-04-24T13:53:36Z at 2014-04-24T13:53:36Z by llandale
  • Estebell
    Estebell
    84 Posts

    Re: Regexp with new lines

    ‏2017-03-31T13:35:55Z  

    Hello Louie,

    I continue in this topic because I have a new problem.

    When a string has special characters, my regexp is not working. Here is my code and I don't know why it does not work.

    It prints "Objectif :\nVérifier que la sortie est capable d'effectuer 5000 man"  although intOf "œ"  = 156  that is between 1 and 255 ...

    void Test (string s){
        string anychar = "["charOf(1) "-" charOf(255)"]"
            Regexp Text = regexp2 "(Objectif[ ]*:[ ]*\\n)("anychar"*)"
            if(Text s)
            {
                    s = s[match 2] ""
            }
            print s""
    }
            
            Test ("Objectif :\nVérifier que la sortie est capable d'effectuer 50000 manœuvres.")
    

     

    Updated on 2017-03-31T13:39:01Z at 2017-03-31T13:39:01Z by Estebell
  • Mathias Mamsch
    Mathias Mamsch
    2542 Posts

    Re: Regexp with new lines

    ‏2017-03-31T15:30:37Z  
    • Estebell
    • ‏2017-03-31T13:35:55Z

    Hello Louie,

    I continue in this topic because I have a new problem.

    When a string has special characters, my regexp is not working. Here is my code and I don't know why it does not work.

    It prints "Objectif :\nVérifier que la sortie est capable d'effectuer 5000 man"  although intOf "œ"  = 156  that is between 1 and 255 ...

    <pre class="html dw" data-editor-lang="js" data-pbcklang="html" dir="ltr">void Test (string s){ string anychar = "["charOf(1) "-" charOf(255)"]" Regexp Text = regexp2 "(Objectif[ ]*:[ ]*\\n)("anychar"*)" if(Text s) { s = s[match 2] "" } print s"" } Test ("Objectif :\nVérifier que la sortie est capable d'effectuer 50000 manœuvres.") </pre>

     

    DOORS is internally using UTF8, which has a much wider set of characters. The char type can hold much more than 255 different characters, but IBM never bothered to correct the "charOf", "intOf" functions. 

     

    So while your test suggests that intOf "œ"  = 156, it really 339 (see http://www.fileformat.info/info/unicode/char/0153/index.htm) and therefore does not match the range 0-255 (which regexp fully respects). 

    char c = addr_ 339
    print c
    
    c = charOf 339
    print c
    

    So the easiest way for your specific regexp, is to remove the "anychar" part from the regex and simply take a substring after the "end 0" match. If you really need anychar, you need to resolve to something like 

    "[^" + (charOf 1) + "]"
    

    (which assumes that you do not consider chr(1) as a valid character in your text. Hope this helps, Regards, Mathias

     

  • Estebell
    Estebell
    84 Posts

    Re: Regexp with new lines

    ‏2017-04-03T08:28:41Z  

    Well I need anychar so I resolve by excepting ñ character.

     

    Thanks a lot !