Ask Your Question
0

Cluster Batch Mode:- HDFS Origin- Roll multiple files into single file

asked 2018-07-10 11:19:39 -0600

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

In my cluster batch mode pipeline I am reading mutliple files from HDFS origin (right now CSVs) , when I write them to destination, I want them to write as single file instead of multiple. Any suggestion how can I do that? I tried using Max Records in File option with/without Max File Size but my files are not rolling into one.

Thanks.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-07-10 14:15:54 -0600

iamontheinet gravatar image

Hi!

IMO, writing all records to a single file is not something many production use cases warrant, but you do have the option to do so by setting Max Records in File and/or Max File Size (MB) to very large number.

Cheers, Dash

edit flag offensive delete link more

Comments

Thanks @iamontheinet. Is there any option other than Max Records or Max file size , I can use to roll the data into single file , in case of avro files these selection might not be a great option.

newbee2018 gravatar imagenewbee2018 ( 2018-07-10 15:13:30 -0600 )edit

Not that I know of.

iamontheinet gravatar imageiamontheinet ( 2018-07-10 15:16:34 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-07-10 11:19:39 -0600

Seen: 39 times

Last updated: Jul 10