Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Hadoop FS close file every hour?

I have a pipeline which more or less has a continuous stream of data flowing through it. I currently have the max file size set to 500mb and idle time of 5 min, but the idle time only triggers if the pipeline stops; there is never a 5 minute idle period while running.

What I would like to happen is to simply say "close the file after an hour of writing." I realize I could handle this with the directory template, but that is currently configured to create a new directory every month, which matches up with my partitioning scheme. I don't actually want a directory for every hour of the day, I just want Streamsets to close out the file every hour so I can query the external table for that data (I have an external table with partitions for each month).

Or am I approaching this problem completely incorrectly?