Ask Your Question

Streamsets Cluster Environment

asked 2017-12-21 13:26:02 -0600

krishnaM gravatar image

How to make streamsets run in clustered environment. we have added streamsets as a parcel to our cloudera cluster and deployed it two nodes but each streamsets node is acting like a seperate instance. i found the below answer in faq's which says it can run on a clustered environment Does the Data Collector run in a clustered environment? Yes, Data Collector utilizes your existing YARN and Spark Streaming implementation to spawn additional workers as needed for scalability.but to my understanding it doesn't answer the high availability of streamsets. if i start a pipeline from my instance-a i can request for more workers but what happens if my instance-a goes down for some reason , i cannot see the pipleline on my instance-b ,not sure if this is a configuration somewhere . what is the use of deploying streamsets on to different hosts(unless for standby ). not sure if i missed anything . i followed the steps as mentioned in the link

edit retag flag offensive close merge delete


Which origin are you using? Cluster batch mode works only for Hadoop FS / MapR-FS origins; cluster streaming for Kafka / Map-R Streams. See

metadaddy gravatar imagemetadaddy ( 2017-12-21 17:13:32 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2017-12-21 17:37:57 -0600

krishnaM gravatar image

updated 2018-01-08 19:03:40 -0600

I tried the jdbc origin, it came up with the warning. but what i would like to achieve is to see the same pipelines when i login from different streamset nodes. so if one of my node goes down i still have one instance of streamsets.(high availibilty)

edit flag offensive delete link more
Login/Signup to Answer

Question Tools


Asked: 2017-12-21 13:26:02 -0600

Seen: 47 times

Last updated: Jan 08