How can I maintain connection node lists centrally?

asked 2017-05-13

updated 2017-05-19

Many processors in StreamSets require connection hosts and ports (Kafka brokers, Zookeepers, ElasticSearch nodes...). How can I store the connection list centrally, so that a change can be made all at once when needed?

Answer

answered 2017-05-13

Use the loadResource function, like this:

${runtime:loadResource("kafka-prod.nodes", true)}

Specifying 'true' requires that the file must be owned by the same user that is running SDC.

The file extension doesn't matter, but I find it helpful to use ".nodes" for all the node list resource files. kafka-prod.nodes goes in your $SDC_RESOURCES directory (normally /opt/sdc/resources). The file itself would be a comma-separated list like this, since that's what the Kafka producer/consumer stages in SDC require:


Make sure to strip any newlines or whitespace from files. I find it easiest to write the files from a Python script to ensure there are no stray characters. Plus, then you can commit the script to version control should you need to roll back to a previous version of the node list.

Asked: 2017-05-13

