Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

File ingestion cluster batch mode:Increasing Batch size does not help in improving the speed of data transfer

I have created a pipeline in cluster batch mode to ingest a file from one HDFS location to another. The worker memory is set to 4096MB and the size of input file is 1.74GB. while run the pipeline with batch size set to 1000 the total time taken to complete the data transfer was 9.26 minutes and when I tried with batch size set to 10000 total time taken was 10.43 minutes. Is there any method to improve the performance of the pipeline? is the time taken(9.26 minutes) is normal for transfering 1.74GB of data with streamsets?

File ingestion cluster batch mode:Increasing Batch size does not help in improving the speed of data transfer

I have created a pipeline in cluster batch mode to ingest a file from one HDFS location to another. The worker memory is set to 4096MB and the size of input file is 1.74GB. while run the pipeline with batch size set to 1000 the total time taken to complete the data transfer was 9.26 minutes and when I tried with batch size set to 10000 total time taken was 10.43 minutes. Is there any method to improve the performance of the pipeline? is the time taken(9.26 minutes) is normal for transfering 1.74GB of data with streamsets?streamsets? Is it possible to increase the speed by increasing the batch size?

File ingestion cluster batch mode:Increasing Batch size does not help in improving the speed of data transfer

I have created a pipeline in cluster batch mode to ingest a file from one HDFS location to another. The worker memory is set to 4096MB and the size of input file is 1.74GB. while run the pipeline with batch size set to 1000 the total time taken to complete the data transfer was 9.26 minutes and when I tried with batch size set to 10000 total time taken was 10.43 minutes. Is there any method to improve the performance of the pipeline? is the time taken(9.26 minutes) is normal for transfering 1.74GB of data with streamsets? Is it possible to increase the speed by increasing the batch size?