Ask Your Question

Where I can find StreamSets architecture and how SDC works?

asked 2017-12-03 08:50:39 -0500

Ravi gravatar image

updated 2017-12-07 11:47:25 -0500

metadaddy gravatar image

Where i can find StreamSets architecture and how SDC works?

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2017-12-04 12:49:35 -0500

Roh gravatar image

updated 2018-05-29 15:20:34 -0500

metadaddy gravatar image

Stream sets have really good documentation please refer the links below.

Documentation -

Some tutorials -

Data collector for continuous integration from RDBMS-

Some use cases :

Even some youtube videos:

you will find lot more examples and online resources as you go !! Good luck!

edit flag offensive delete link more

answered 2017-12-07 10:35:58 -0500

aman gravatar image

I am not very clear about the architecture even after going through tutorials. How do we scale streamset in a distributed environment? Let's say, our input data velocity increases from origin then how to ensure that SDC doesn't give performance issues? How many daemons will be running? Will it be Master worker architecture or peer to peer architecture?

If there are multiple daemons running on multiple machines (e.g. one sdc along with one NodeManager in YARN) then how it will show centralized view of data i.e. total record count etc.?

Also please do let me know architecture of Dataflow performance manager. Which all daemons are there in this product?

edit flag offensive delete link more


Please ask this as a new question, or leave a comment on the existing question - it does not answer the original question.

metadaddy gravatar imagemetadaddy ( 2017-12-07 11:47:07 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-12-03 08:50:39 -0500

Seen: 353 times

Last updated: May 29