Topic
  • 2 replies
  • Latest Post - ‏2013-08-29T20:57:40Z by Omer Tasli
Omer Tasli
Omer Tasli
2 Posts

Pinned topic Unwanted punctuation in Filesink output file

‏2013-08-29T12:21:19Z |

Hi all,

Firstly consider me a starter to Streams.

I'm working on a project to get certain information(tweet creation time,id etc) from tweets I searched. I can get them on a test web site like this. My aim is to get these results on a local file then use them to mapreduce with Biginsights. I used InetSource toolkit to stream data from my test file on the web.

My problem is I need the data in seperate lines, without quotation marks. For example:




12312312,
Tue Aug 27 12:27:25 +0000 2013

24234232,Tue Aug 27 12:27:25 +0000 2013

 

But my current output is 1 line, strings in "" and brackets when file format is csv:

["372334535192506368","Tue Aug 27 12:27:25 +0000 2013","372334534227816449","Tue Aug 27 12:27:25 +0000 2013","372334520210841600","Tue Aug 27 12:27:21 +0000 2013","372334511830208512"]

How do I get rid of these brackets? What should I change? Here is my .spl file(also in attachment). I tried to modify sample project in toolkit.

 

usecom.ibm.streams.inet::InetSource ;

compositeInetSourceTest

{

     graph

          stream<list<rstring> result5> Result= InetSource()

         {     param

                   URIList : [ "http://omertasli.p.ht/result.txt"] ;

                   inputLinesPerRecord : 1u;

                   fetchInterval : 60.0;

                   emitTuplePerURI : true;

                   punctPerFetch : false;

      }

       () asSink = FileSink(Result)

       {    param

                  file : "re{id}.txt";

                  format : csv ;

                  closeMode : count ;

                  tuplesPerFile : 1u;

                  quoteStrings : false;

          }

}

Thanks for your help.

Attachments

  • hnasgaard
    hnasgaard
    200 Posts
    ACCEPTED ANSWER

    Re: Unwanted punctuation in Filesink output file

    ‏2013-08-29T18:07:16Z  

    The reason the square brackets are there is because you are writing out a list and the quotes are there because list elements are strings and it is csv format.  Try the following:

    Add the following operator between the inet source and the sink:

          stream<rstring r> P = Custom(Result) {
            logic onTuple Result : {
              for (rstring t in result5) {
                submit({r=t}, P);
              }
            }
          }

                   () as Sink = FileSink(P)

    ....

    and change the Sink format to line (and get rid of quoteStrings).

  • hnasgaard
    hnasgaard
    200 Posts

    Re: Unwanted punctuation in Filesink output file

    ‏2013-08-29T18:07:16Z  

    The reason the square brackets are there is because you are writing out a list and the quotes are there because list elements are strings and it is csv format.  Try the following:

    Add the following operator between the inet source and the sink:

          stream<rstring r> P = Custom(Result) {
            logic onTuple Result : {
              for (rstring t in result5) {
                submit({r=t}, P);
              }
            }
          }

                   () as Sink = FileSink(P)

    ....

    and change the Sink format to line (and get rid of quoteStrings).

  • Omer Tasli
    Omer Tasli
    2 Posts

    Re: Unwanted punctuation in Filesink output file

    ‏2013-08-29T20:57:40Z  
    • hnasgaard
    • ‏2013-08-29T18:07:16Z

    The reason the square brackets are there is because you are writing out a list and the quotes are there because list elements are strings and it is csv format.  Try the following:

    Add the following operator between the inet source and the sink:

          stream<rstring r> P = Custom(Result) {
            logic onTuple Result : {
              for (rstring t in result5) {
                submit({r=t}, P);
              }
            }
          }

                   () as Sink = FileSink(P)

    ....

    and change the Sink format to line (and get rid of quoteStrings).

    Thanks very much for your help and explanation. I actually tried changing list to rstring at first place but than operator generation error raised. Now its working correctly.