File ingestion cluster batch mode:is there any way to increase the speed of data transfer?will increasing batch size help?

asked 2018-02-27 03:37:09 -0500

nimmy bosco gravatar image

updated 2018-02-28 06:28:23 -0500

I have created a pipeline in cluster batch mode to ingest a file from one HDFS location to another. The worker memory is set to 4096MB and the size of input file is 1.74GB. while run the pipeline with batch size set to 1000 the total time taken to complete the data transfer was 9.26 minutes and when I tried with batch size set to 10000 total time taken was 10.43 minutes. Is there any method to improve the performance of the pipeline? is the time taken(9.26 minutes) is normal for transfering 1.74GB of data with streamsets? Is it possible to increase the speed by increasing the batch size?

edit retag flag offensive close merge delete