Topic
3 replies Latest Post - ‏2013-09-16T14:44:13Z by david.cyr
david.cyr
david.cyr
20 Posts
ACCEPTED ANSWER

Pinned topic HDFSFileSink - ignoreWindowMarkers not supported?

‏2013-09-13T15:52:32Z |

In the InfoCenter documentation, the HDFSFileSink operator in the bigdata toolkit (I am using version 1.0.2 of the bigdata toolkit) refers to a parameter "ignoreWindowMarkers". This parameter doesn't show up in the 'param' tab of my 'edit' box for the operator, and if I try to manually add the param in the SPL I get a compile error that there isn't an "ignoreWindowMarkers" param.

Does anybody know if this was supported in a previouis version, or if it's expected for a future version?

Since I can't use the ignoreWindowMarkers in my code, I have an extra step where I am using a Custom operator to snip out Window punctuations:

(stream<FinTransResult> ProcessedStreamWithoutWindowPunctuations) as
   RemoveWindowPunctuations = Custom(FinancialTransactionResults)
  {
   logic
    onTuple FinancialTransactionResults : submit(FinancialTransactionResults,
     ProcessedStreamWithoutWindowPunctuations) ;
    onPunct FinancialTransactionResults : if(currentPunct() !=
     Sys.WindowMarker) submit(currentPunct(),
     ProcessedStreamWithoutWindowPunctuations) ;
   }

and if possible, I'd rather not have custom code to do this (but without it, the HDFSFileSink is writing (overwriting, since it can't append)) on each individual transaction because I happen to have a window size of 1 transaction because of previous join logic, etc.

Any suggestions or information on how to get "ignoreWindowMarkers" working?

 

thanks in advance,

david

  • Kevin_Foster
    Kevin_Foster
    98 Posts
    ACCEPTED ANSWER

    Re: HDFSFileSink - ignoreWindowMarkers not supported?

    ‏2013-09-14T19:06:20Z  in response to david.cyr

    I see the same discrepancy in my 3.1 installation. The documented parameter ignoreWindowMarker is not available in the software, and the parameters HDFSUser, NamenodeHost, NamenodePort, and testMode are all available but not documented. (There is documentation on most of those additional parameters in the tooling. Would like to understand testMode better though...)

    BTW, just to confirm, when you say it is "overwriting", do you mean that it is creating too many small one-tuple files in HDFS based on your file parameter specification but still successfully writing all tuples? It was not actually losing data, was it? You should always be able to chunk your data out into multiple files continuously with zero data loss... regardless of the number in each file.

    -Kevin

     

    • SXVK_Roopa_v
      SXVK_Roopa_v
      13 Posts
      ACCEPTED ANSWER

      Re: HDFSFileSink - ignoreWindowMarkers not supported?

      ‏2013-09-16T05:04:31Z  in response to Kevin_Foster

       David,

      The  ignoreWindowMarkers parameter isn't available in 3.1 release or prior. It is handled in the 3.2 release.

    • david.cyr
      david.cyr
      20 Posts
      ACCEPTED ANSWER

      Re: HDFSFileSink - ignoreWindowMarkers not supported?

      ‏2013-09-16T14:44:13Z  in response to Kevin_Foster

      Hi - it seems to be overwriting, but it is also possibly user error on my part (I won't rule that out - I'm not sure if the browser I'm using for HDFS is by default only showing the last block). Another 'issue' [on my side, not on the tool side] is that for this initial test I was just giving a fixed, constant file name, not giving it any kind of %FILENUM or %TIME salting.

       

      Thanks for the answer/update that this is handled in 3.2, until then I'll leave my custom task in to nibble off the WindowMarkers.