Cluster Batch Mode:- HDFS Origin- Roll multiple files into single file

2018-07-10

In my cluster batch mode pipeline I am reading mutliple files from HDFS origin (right now CSVs) , when I write them to destination, I want them to write as single file instead of multiple. Any suggestion how can I do that? I tried using Max Records in File option with/without Max File Size but my files are not rolling into one.


2018-07-10

IMO, writing all records to a single file is not something many production use cases warrant, but you do have the option to do so by setting Max Records in File and/or Max File Size (MB) to very large number.

Cheers, Dash

Thanks @iamontheinet. Is there any option other than Max Records or Max file size , I can use to roll the data into single file , in case of avro files these selection might not be a great option.

Not that I know of.

Asked: 2018-07-10

Seen: 217 times

Last updated: Jul 10 '18