StreamSets was very slow while writing Multiple xml files (more than 5k) from SFTP to HadoopFS

asked 2017-12-05 04:26:53 -0500

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I tried to write more than 5000 xml files from SFTP to HadoopFS. The total size of the files is 2.89GB.While I was starting the StreamSets it reads the records in the very slowest manner. Is there any way to run StreamSets faster?

edit retag flag offensive close merge delete


Also, please add some more clarity here. What does the performance breakdown metrics of the pipeline show when it's running (i.e. which stages are taking the most time)?

jeff gravatar imagejeff ( 2017-12-12 16:15:04 -0500 )edit