Ask Your Question
0

Where can I find the architecture design document of StreamSets Data Collector?

asked 2017-11-08 23:41:32 -0600

casel.chen gravatar image

Where can I find the architecture design document of StreamSets Data Collector? Thanks!

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2017-12-07 10:37:13 -0600

aman gravatar image

I am not very clear about the architecture even after going through tutorials. How do we scale streamset in a distributed environment? Let's say, our input data velocity increases from origin then how to ensure that SDC doesn't give performance issues? How many daemons will be running? Will it be Master worker architecture or peer to peer architecture?

If there are multiple daemons running on multiple machines (e.g. one sdc along with one NodeManager in YARN) then how it will show centralized view of data i.e. total record count etc.?

Also please do let me know architecture of Dataflow performance manager. Which all daemons are there in this product?

edit flag offensive delete link more
0

answered 2017-11-20 16:17:18 -0600

Jerry gravatar image

Please look thru the StreamSets Data Collector online help.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-11-08 23:41:32 -0600

Seen: 2,171 times

Last updated: Nov 20 '17