Topic
  • 3 replies
  • Latest Post - ‏2013-09-09T18:35:35Z by danlopezv
danlopezv
danlopezv
22 Posts

Pinned topic FileSink not flushing as expected

‏2013-09-05T14:45:46Z |

I have a FileSink operator that writes to the file system every time a tuple arrives. The thing is that even though I have the parameter flush set to 1 it is writing to the file every 20-25 tuples. I use the tail -f command to watch the file while it updates and what I see is chunks of data coming in every certain amount of seconds. I also use the ls -l command to watch the file size and I can see that the size doesn't change until the other chunk of tuples comes in. I did a simple test using a beacon with period 1.0 that outputs the current timestamp and the file doesn't update every second. I created a simple Java program that writes to the file system every second and it works as expected. Here is the test code I'm using:

stream<rstring tstamp> tsGen = Beacon(){

   param

      period : 1.0;

   output

      tsGen : tstamp = ctime(getTimestamp());

}

() as toFileSystem = FileSink(tsGen){

   param

      file : "output/test.txt";

      format : csv;

      separator : "|";

      flush : 1u;

      quoteStrings : "true";

}

What is strange is that we have another Streams environment with the same OS/Streams version and same code and it works perfect. Also when we run it in standalone it works perfect but when we run it in Distributed mode in one of our instances we see this strange behavior.

Our Streams environment:

Streams version: 3.1

OS: RHEL 6.3 64-bit version

Thanks,

Daniel Lopez

  • KanatT.
    KanatT.
    13 Posts
    ACCEPTED ANSWER

    Re: FileSink not flushing as expected

    ‏2013-09-05T21:02:21Z  

    We will look further into this, but I could not reproduce the problem on our cluster with the environment you said.  Using the code provided, I observed the file being updated roughly every second (tail -f prints a new line and ls -l reports a size increase).  

    One thing I did notice was that if I was observing on a different node than the one the FileSink is on, there is a lag in the network-file system (in our case NFS) enough so that a tail -f would show a couple of new lines in bulk every few seconds.

    To help us continue the investigation, could you let us know the compiler options you used to compile the code and what type of file system is the program writing to? 

  • KanatT.
    KanatT.
    13 Posts

    Re: FileSink not flushing as expected

    ‏2013-09-05T21:02:21Z  

    We will look further into this, but I could not reproduce the problem on our cluster with the environment you said.  Using the code provided, I observed the file being updated roughly every second (tail -f prints a new line and ls -l reports a size increase).  

    One thing I did notice was that if I was observing on a different node than the one the FileSink is on, there is a lag in the network-file system (in our case NFS) enough so that a tail -f would show a couple of new lines in bulk every few seconds.

    To help us continue the investigation, could you let us know the compiler options you used to compile the code and what type of file system is the program writing to? 

  • danlopezv
    danlopezv
    22 Posts

    Re: FileSink not flushing as expected

    ‏2013-09-06T15:08:05Z  
    • KanatT.
    • ‏2013-09-05T21:02:21Z

    We will look further into this, but I could not reproduce the problem on our cluster with the environment you said.  Using the code provided, I observed the file being updated roughly every second (tail -f prints a new line and ls -l reports a size increase).  

    One thing I did notice was that if I was observing on a different node than the one the FileSink is on, there is a lag in the network-file system (in our case NFS) enough so that a tail -f would show a couple of new lines in bulk every few seconds.

    To help us continue the investigation, could you let us know the compiler options you used to compile the code and what type of file system is the program writing to? 

    The compiler options I used are the following:

    sc -M namespace::applicationName --output-directory=output/namespace.appName/Distributed --data-directory=data

    The file system the program is writing to is "ext4". This is coming from another SLES NFS server that uses "ext3".

  • danlopezv
    danlopezv
    22 Posts

    Re: FileSink not flushing as expected

    ‏2013-09-09T18:35:35Z  
    • KanatT.
    • ‏2013-09-05T21:02:21Z

    We will look further into this, but I could not reproduce the problem on our cluster with the environment you said.  Using the code provided, I observed the file being updated roughly every second (tail -f prints a new line and ls -l reports a size increase).  

    One thing I did notice was that if I was observing on a different node than the one the FileSink is on, there is a lag in the network-file system (in our case NFS) enough so that a tail -f would show a couple of new lines in bulk every few seconds.

    To help us continue the investigation, could you let us know the compiler options you used to compile the code and what type of file system is the program writing to? 

    I did a tail -f from the host/node the job was running in and there was no chunking (the file was updating every time a tuple came in). I will mark this question as answered but do you guys have an explanation for this behavior?

    - Daniel Lopez