I have a use case where I need to use the HDFSFileSink operator to write data for BigInsights, and need to ensure that no data sits in the buffer for too long, even in periods of slow data arrival (such that the buffer isn't constantly filling up and flushing). This is easy enough to accomplish with a Beacon and punctuatrions.
We have noticed, however, that when using this approach we often end up with an initial 0-byte file, and then the files of data. The HDFSFileSink operator isn't continuously generating 0-byte files (for example, in periods of no data, the heartbeats aren't causing files ot be created with no data). The pattern I've seen so far seems limited to an initial 0-byte file, and then files with the data.
The 0-byte file seems to be causing them some problems processing information in the directory on the BigInsights side (I'm not sure of the details), so I'm trying to see if I can find a way to avoid producing them.
I'm including a quick sample program that I can use to duplicate this problem in my environment... The behavior of this program is that it will produce one 0-byte file (e.g. test 4815.0.0) and then 5 or 6 files containing the rows of data (labeled 0-29, per the IterationCount). It does not continue to produce 0 byte files, even though the beacon keeps chirping for a few minutes after the data has stopped.
Thanks in advance for any help you can provide in eliminating this pesky 0 byte file. I'll continue looking on my side as well, now that I have a simple test case.
namespace application ;
use com.ibm.streams.bigdata.hdfs::HDFSFileSink ;
(stream<uint64 iterCount> DataChirp) as DataOriginatingBeacon = Beacon()
initDelay : 60.0 ;
iterations : 30 ;
period : 5.0 ;
DataChirp : iterCount = IterationCount() ;
(stream<uint64 iterCount> BufferFlushChirp) as BufferFlushHeartbeatsBeacon =
initDelay : 3.0 ;
iterations : 20 ;
period : 30.0 ;
BufferFlushChirp : iterCount = IterationCount() ;
(stream<uint64 iterCount> BufferFlushHeartbeat) as Custom_3 =
Custom(BufferFlushChirp as inPort0Alias)
onTuple BufferFlushChirp : submit(Sys.WindowMarker, BufferFlushHeartbeat) ;
() as HDFSFileSink_4 = HDFSFileSink(BufferFlushHeartbeat, DataChirp)
format : txt ;
hdfsConfigFile : 'hdfsconfig.txt' ;