I tried to write more than 5000 xml files from SFTP to HadoopFS. The total size of the files is 2.89GB.While I was starting the StreamSets it reads the records in the very slowest manner. Is there any ways to run StreamSets in faster manner?

Can I know if you are having this slow behavior only in Stremsets? Try increasing the batch size to a higher number and see if it improves speed.

Roh ( 2017-12-06 14:31:38 -0600 )

Also, please add some more clarity here. What does the performance breakdown metrics of the pipeline show when it's running (i.e. which stages are taking the most time)?

jeff ( 2017-12-12 16:15:04 -0600 )