Ask Your Question
1

What is the architecture design of StreamSets Data Collector?

asked 2017-12-07 10:32:47 -0600

aman gravatar image

updated 2017-12-08 13:29:40 -0600

metadaddy gravatar image

I am not very clear about the architecture even after going through tutorials. How do we scale streamset in a distributed environment? Let's say, our input data velocity increases from origin then how to ensure that SDC doesn't give performance issues? How many daemons will be running? Will it be Master worker architecture or peer to peer architecture?

If there are multiple daemons running on multiple machines (e.g. one sdc along with one NodeManager in YARN) then how it will show centralized view of data i.e. total record count etc.?

Also please do let me know architecture of Dataflow performance manager. Which all daemons are there in this product?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2017-12-08 13:29:31 -0600

metadaddy gravatar image

StreamSets Data Collector (SDC) scales by partitioning the input data. In some cases, this can be done automatically, for example Cluster Batch mode runs SDC as a MapReduce job on the Hadoop / MapR cluster to read Hadoop FS / MapR FS data, while Cluster Streaming mode leverages Kafka partitions and executes SDC as a Spark Streaming application to run as many pipeline instances as there are Kafka partitions.

In other cases, StreamSets can scale by multithreading - for example, the HTTP Server and JDBC Multitable Consumer origins run multiple pipeline instances in separate threads.

In all cases, Dataflow Performance Manager (DPM) can give you a centralized view of the data, including total record count.

edit flag offensive delete link more

Comments

Does it mean that StreamSets is not distributed ? I deployed streamsets to two nodes on my cloudera cluster and the two nodes are acting as seperate instances. I couldn't find the documentation how to make it a cluster. Could you throw some light on this or point me to the right documentation. i f

krishnaM gravatar imagekrishnaM ( 2017-12-20 19:53:37 -0600 )edit
Login/Signup to Answer

Question Tools

2 followers

Stats

Asked: 2017-12-07 10:32:47 -0600

Seen: 140 times

Last updated: Dec 08 '17