Ask Your Question

How does StreamSets handle throttling large numbers of pipelines?

asked 2018-08-08 11:00:14 -0500

dcwatson84 gravatar image

updated 2018-08-08 13:18:04 -0500

metadaddy gravatar image

If I had 100k pipelines, and needed them all to run once an hour, and I triggered them all to run at XX:00:00, how would StreamSets handle throttling the system when that load becomes too much? Obviously the CPU has limitations in terms of parallel threads, but that level of throttling isn't enough to keep a system from crashing. If streamsets actually attempted to execute all 100k then regardless of CPU capabilities, memory could quickly get used up, which would cause lots of swapping and inefficient use of resources.

So the only options I can come up with are...

  1. It queues some.
  2. It fails some.
  3. It runs them all.

If it's #1 or #2, then what logic is used to determine when something is queued or failed? If it's #3 then the implication is that even with relatively lightweight pipelines, it's possible for streamsets to bog down simply by kicking off too many at once?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-08-08 13:17:52 -0500

metadaddy gravatar image

At present, the answer is #3 - Data Collector does exactly what you tell it to do. It's up to you to schedule your pipelines appropriately.

edit flag offensive delete link more


There is an open Jira to at least enable throttling of the pipeline starts. Please watch if interested:

jeff gravatar imagejeff ( 2018-08-08 14:06:40 -0500 )edit

That's unfortunate. So given that the DC is easy to bog down, is there an API to monitor it so that we can prevent that? Even if I queued the work externally, I still need to be able to detect when the DC can handle more work.

dcwatson84 gravatar imagedcwatson84 ( 2018-08-08 18:16:10 -0500 )edit

I think you can get pretty much every stat via a JMX request - http://hostname:18630/rest/v1/system/jmx

metadaddy gravatar imagemetadaddy ( 2018-08-08 18:28:12 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-08-08 11:00:14 -0500

Seen: 875 times

Last updated: Aug 08 '18