Topic
4 replies Latest Post - ‏2013-04-01T14:14:59Z by SystemAdmin
SystemAdmin
SystemAdmin
1245 Posts
ACCEPTED ANSWER

Pinned topic Whitespace Delimiter

‏2013-04-01T09:27:54Z |
I’m having trouble using tokenize function with whitespace as delimiter. As I checked, I can easily use tokenize as long as there is only “one” whitespace as a delimiter. But my input has “eight” whitespace delimiter. These are the codes I tried to run for “eight” whitespace delimiter. I based all my parameters on Java and C++.

tokens = tokenize (line, " ", true);
tokens = tokenize (line, "/s/s/s/s/s/s/s/s", true);
tokens = tokenize (line, "/ /", true);
tokens = tokenize (line, "\\s+\\s+\\s+\\s+\\s+\\s+\\s+\\s+", true);

Sample Input
MWTRF03030000 60.0009208438695 411.4609184139049 61.0000GSM 020130423 20130324193350 003232599589
  • hnasgaard
    hnasgaard
    200 Posts
    ACCEPTED ANSWER

    Re: Whitespace Delimiter

    ‏2013-04-01T10:36:29Z  in response to SystemAdmin
    The string in tokenize represents a set of separators, where each character is a possible separator, not all the characters in the string at once. It also doesn't support regular expressions. You could try regexMatch, or regexMatchPerl
    • SystemAdmin
      SystemAdmin
      1245 Posts
      ACCEPTED ANSWER

      Re: Whitespace Delimiter

      ‏2013-04-01T11:20:32Z  in response to hnasgaard
      Hi, thank you so much for your suggestion but, I tried regexMatchPerl and regexMatch function as you told me, but it seems that i can't compile it using this expression for "eight" whitespaces. the regex "\s" cannot be compiled.

      tokens = regexMatchPerl(line, "\s{8}");

      I also tried other regex like:

      tokens = regexMatchPerl(line, "\\s+{8}");
      tokens = regexMatchPerl(line, "\s{8}");
      tokens = regexMatchPerl(line, "[ ]|[ ]|[ ]|[ ]");

      But nothing still works.
      • hnasgaard
        hnasgaard
        200 Posts
        ACCEPTED ANSWER

        Re: Whitespace Delimiter

        ‏2013-04-01T12:58:12Z  in response to SystemAdmin
        Here's a sample program I tried out that seems to work..
        
        composite Main 
        { param expression<rstring> $patt : getSubmissionTimeValue(
        "patt"); graph stream<rstring s> In = FileSource() 
        { param file : 
        "in.dat"; format : line; 
        } stream<list<rstring> l> C = Custom(In) 
        { logic onTuple In : 
        { mutable list<rstring> li = regexMatchPerl(s, $patt); submit(
        {l=li
        }, C); 
        }   
        } () as O = Custom(C) 
        { logic onTuple C : 
        { 
        
        for (rstring r in l) println(r); 
        } 
        } 
        }
        

        Compile as a standalone program
        
        sc -M Main -T
        

        and run it with the following string:
        
        output/bin/standalone Main.patt=
        "(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)"
        

        If you hard-code the string you will have to escape the '\' as '\\'
        • SystemAdmin
          SystemAdmin
          1245 Posts
          ACCEPTED ANSWER

          Re: Whitespace Delimiter

          ‏2013-04-01T14:14:59Z  in response to hnasgaard
          Thanks you very much Mr. hnasgaard. This solves the problem. :)