Does the rest API handle distribution of work across a cluster?

Does the rest api distribute work across a cluster? For example, If im kicking off a large number of pipelines but I need them spread across a set of nodes, do I have to manually call the rest API on each node? Or does calling the API on one node handle spreading them across a cluster?

2 Answers

Each Data Collector exposes REST API that can be used to control it - all the functionality that is available in Data Collector UI is available on the REST interface as the UI actually uses the REST interface under the hood.

Management of multiple Data Collectors is done through Control Hub that has a separate REST interface - including the ability to have a "cluster of Data Collectors" and automatically distributing workload (a large number of pipelines) across those nodes with a simple REST call. And even further, Contol Hub have ability to rebalance the workload in case that some of the pipelines finished sooner then others.

With StreamSets Data Collector, the scope of the REST API is limited to the Data Collector you are communicating with, so if you tell a Data Collector to start a pipeline, only that Data Collector instance will start the pipeline.

With StreamSets Control Hub, on the other hand, a job maps a pipeline to a Data Collector label and number of instances, so you can send a single 'Start Job' REST API call and Control Hub will run the job's pipeline on the relevant Data Collector instances.

Asked: 2018-08-09 07:35:18 -0600

