I am using the bigdata toolkit (HDFSFileSink) to write to HDFS for BigInsights to perform further processing and analysis of the files (CSV format) using BigSheets.
The BigInsights side folks have requested that the files I produce have CSV 'header' rows, but this is a bit difficult given that the HDFSFileSink operator (which is responsible for the buffer flushes and the creation of new files) isn't natively aware of the CSV format (I'm using a Format operator to render my output from tuple to CSV formatted lines). Even the native FileSink operator doesn't really support creating CSV files with headers (though the FileSource has a 'hasHeaders' option when reading).
I imagine that the use case of writing from Streams to BigInsights for further processing (including BigSheets) analysis isn't atypical, so I wanted to check what the recommended approach was for clean transition from the world of Streams into BigInsights. Is there an option other than generating the CSV files with headers to produce a meaningful BigSheets experience? Without the header rows, the BigSheets 'headers' are data values (I think from the first row), which doesn't look quite right to the data analysis people.
thanks in advance for any assistance or insight, happy to answer more questions if additional clarifications or context are needed,