Reduce "Starting" duration for cluster batch mode

asked 2018-05-30 03:07:03 -0500

davidha

Hi, I have been using Streamsets for half year. I found that the "Starting" duration is very long for cluster batch mode job, usually, 2-3 mins or more. That may not be noticeable for huge data set, but now I got plenty of small data to transfer, the actual processing time is acceptable while the 2-3mins "Starting" time is so annoying. Is there any method to eliminate the long waiting time?


What's the version you are on? And what are the stages involved in your pipeline?

Mufy ( 2018-05-30 03:16:23 -0500 )

I am using streamsets-datacollector- I have no stage involve. Just from one Origin MapRFS to MapRFS destination. Sometimes the waiting time goes up to 10 mins which is not justifiable. Is there any method to check where is the run struggling at?

davidha ( 2018-05-31 04:11:30 -0500 )

I found significant improvement after I restarting SDC, is it due to running SDC continuously will keep caching something not useful in memory or the JVM?

davidha ( 2018-05-31 21:18:02 -0500 )

I'd recommend picking the latest version of SDC as there have been several enhancements gone into improving the JVM memory management and the likes.

Mufy ( 2018-06-01 01:13:23 -0500 )

Thank you for the suggestion. We are having Streamsets on Production usage, may not be able to upgrade very easily. So is the JVM memory management one of the major focus in recent versions? I am thinking of a work around that schedule the SDC to restart to solve the problem, any thoughts on that?

davidha ( 2018-06-01 03:21:21 -0500 )