Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Thanks a lot for the answers. So my understanding is that if there 20-30 JMS Queues, Sqoop Jobs, Files to ingest, there will be tens of ingestion pipes where each instance will need to run individually. For Kafka Topics with each having several partitions again the same approach should be carried out. All these StreamSets instances will run on separate servers then? Meaning, we have Spark, Hadoop nodes and I assume, we can use the same Spark nodes and install Stream Set instances to run all these ingestion pipelines. And for Processing the same Spark nodes can be used...

The advantage of using StreamSets seems that, * It is easy to do the plumbing and making changes on the pipelines as it is visual * Can see the performance, SLA's of the pipelines with Performance metrics * Can develop the pipeline in shorter period of time because of existing tools * ???