Ask Your Question

Does the rest API handle distribution of work across a cluster?

asked 2018-08-09 07:35:18 -0600

dcwatson84 gravatar image

updated 2018-08-09 07:47:35 -0600

Does the rest api distribute work across a cluster? For example, If im kicking off a large number of pipelines but I need them spread across a set of nodes, do I have to manually call the rest API on each node? Or does calling the API on one node handle spreading them across a cluster?

edit retag flag offensive close merge delete

2 Answers

Sort by » oldest newest most voted

answered 2018-08-14 10:23:04 -0600

Each Data Collector exposes REST API that can be used to control it - all the functionality that is available in Data Collector UI is available on the REST interface as the UI actually uses the REST interface under the hood.

Management of multiple Data Collectors is done through Control Hub that has a separate REST interface - including the ability to have a "cluster of Data Collectors" and automatically distributing workload (a large number of pipelines) across those nodes with a simple REST call. And even further, Contol Hub have ability to rebalance the workload in case that some of the pipelines finished sooner then others.

edit flag offensive delete link more

answered 2018-08-14 10:19:10 -0600

metadaddy gravatar image

With StreamSets Data Collector, the scope of the REST API is limited to the Data Collector you are communicating with, so if you tell a Data Collector to start a pipeline, only that Data Collector instance will start the pipeline.

With StreamSets Control Hub, on the other hand, a job maps a pipeline to a Data Collector label and number of instances, so you can send a single 'Start Job' REST API call and Control Hub will run the job's pipeline on the relevant Data Collector instances.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-08-09 07:35:18 -0600

Seen: 239 times

Last updated: Aug 14 '18