IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
1 reply Latest Post - ‏2013-09-17T23:10:35Z by Kevin_Foster
20 Posts

Pinned topic Question concerning expected behavior on HDFSFileSink buffer flush

‏2013-09-17T21:33:35Z |

Is the expected behavior on an HDFSFileSink buffer flush that the new buffer would overwrite the existing file contents (from prior buffer writes), unless you are using a "%FILENUM" or some other in-filename parameter?

For example, if I have a Beacon --> Punctor --> HDFSFileSink, and I'm writing to the HDFS file "foo.txt" (no %FILENUM), would you expect the "foo.txt" to only have one row (the most recent, since the Punctor punctuation causes the HDFSFileSink to flush), or one row for each of the Beacon pulses?

With my testing, it's resulting in only one row in the file, and previous rows are overwritten... is this the expected behavior?

I'd like to confirm this is the expected behavior, or if I'm misunderstanding what I am seeing.

thanks in advance,



  • Kevin_Foster
    98 Posts

    Re: Question concerning expected behavior on HDFSFileSink buffer flush

    ‏2013-09-17T23:10:35Z  in response to david.cyr

    Your results look correct to me based on the operator documentation. Streams will not append to the existing file after punctuation.

    This is analogous to the FileSink operator and the append parameter that defaults to false.

    As you've mentioned, if you are going to close a file during processing then you need a filename parameter (or parameters) to avoid overwriting the data.

    It's interesting to me that you are getting even one row in your file. Apparently Streams is flushing the buffer on final punctuation, but is not reopening the file again (unlike with Window punctuation). Which makes a lot of sense as I think about it...