Ask Your Question

What are the best hosting options for Streamsets?

asked 2018-10-29 16:32:34 -0600

I am a complete newbie to Streamsets so I may be approaching this all wrong. But I am looking to provide a service that runs on regular intervals to move data from system 1 to system 2. I don't want to run this on dedicated hardware. My first preference would be to subscribe to a service if one were available but absent that option my second preference would be to set up a container to run this.

What are the typical steps to doing this. Are there pre-created docker images that can be spun up locally and configured before promoting them to a service like AWS? Do I create the AWS instance and then configure it once it is already loaded remotely?

Can a single instance (assuming the right amount of processing power and memory for the task) run multiple scheduled integration flows or would you need a unique docker instance running per integration task?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-10-31 15:50:13 -0600

jeff gravatar image

updated 2018-10-31 15:52:45 -0600

The Control Hub application is a managed service that can help with this. By using the provisioning capabilities, it is possible to have a fully managed solution for your workflow.

Alternatively, if you want to spin up your own hardware (either physical or virtual), you can certainly make use of our Docker images to run Data Collector instances on those, as needed.

Depending on the performance requirements of the individual pipelines (ex: size of records, number of records per unit time, processing steps, etc.) it is definitely possible to run multiple pipelines in a single Data Collector instance. Control Hub can also help balance the overall job load across multiple instances more seamlessly.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-10-29 16:32:34 -0600

Seen: 12 times

Last updated: Oct 31