Ask Your Question

Streamsets Architecture

asked 2018-01-08 19:10:24 -0500

krishnaM gravatar image

I am not very clear about the architecture of streamsets How do we scale streamset in a distributed environment? Let's say, our input data velocity increases from origin(say database table) then how to ensure that SDC doesn't give performance issues? How many daemons will be running? Will it be Master worker architecture or peer to peer architecture?

If there are multiple daemons running on multiple machines (e.g. one sdc along with one NodeManager in YARN) then how it will show centralized view of data i.e. total record count etc.?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2020-06-11 17:41:18 -0500

iamontheinet gravatar image

updated 2020-06-11 17:45:32 -0500


What you're looking for is StreamSets Control Hub. And here's the link to its detailed documentation.

Other helpful resources:

Cheers, Dash

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-01-08 19:10:24 -0500

Seen: 1,222 times

Last updated: Jun 11