StreamSet was very slow while writing Multiple xml files (more than 5k) from SFTP to HadoopFS

asked 2017-12-05 04:26:53 -0600

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I tried to write more than 5000 xml files from SFTP to HadoopFS. The total size of the files is 2.89GB.While I was starting the StreamSets it reads the records in the very slowest manner. Is there any ways to run StreamSets in faster manner?

edit retag flag offensive close merge delete


Can I know if you are having this slow behavior only in Stremsets? Try increasing the batch size to a higher number and see if it improves speed.

Roh gravatar imageRoh ( 2017-12-06 14:31:38 -0600 )edit

Also, please add some more clarity here. What does the performance breakdown metrics of the pipeline show when it's running (i.e. which stages are taking the most time)?

jeff gravatar imagejeff ( 2017-12-12 16:15:04 -0600 )edit