Ask Your Question
1

What's best practice for deploying a Data Collector pipeline into different environments?

asked 2018-05-31 02:32:27 -0500

casel.chen gravatar image

updated 2018-05-31 09:24:35 -0500

metadaddy gravatar image

I developed some data pipelines with sdc in development environment, now I want to deploy them into QA test environment just change some environment parameters like hdfs uri, mysql jdbc connection string etc and uuid sdc used and finally import them into sdc.

My practice is export pipelines into json files and extract environment parameters then do replacement with ones for target environment. Unfortunately we share the same sdc instance across different environments, and I found some unexpectedly exits.

2018-05-31 15:14:05,751 [user:admin] [pipeline:dev_RISK_INVOCATION_HISTORY/devRISKINVOCATIONHISTORY2b3586ce-4229-4d65-a410-eb07528a80e9] [runner:0] [thread:ProductionPipelineRunnable-devRISKINVOCATIONHISTORY2b3586ce-4229-4d65-a410-eb07528a80e9-dev_RISK_INVOCATION_HISTORY] WARN AppInfoParser - Error registering AppInfo mbean javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=producer-1 at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(AppInfoParser.java:58) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:328) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188) at com.streamsets.pipeline.kafka.impl.KafkaProducer09.createKafkaProducer(KafkaProducer09.java:69) at com.streamsets.pipeline.kafka.impl.BaseKafkaProducer09.init(BaseKafkaProducer09.java:43)

Full sdc log is here: https://gist.github.com/ChenShuai1981...

My questions are:

  1. What's the best practice of deployment sdc pipeline into different environments?
  2. Why sdc exit unexpectively? How to fix it?
edit retag flag offensive close merge delete

Comments

I've answered your first question below. Please split the second one into a separate question.

metadaddy gravatar imagemetadaddy ( 2018-05-31 09:25:12 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-05-31 09:24:05 -0500

metadaddy gravatar image

The best way to handle configuration parameters across different environments is to use Runtime Parameters.

You can define a set of parameters for the pipeline, say HDFS_URI, JDBC_URI, etc:

image description

Reference them in the different stage configurations via the syntax ${JDBC_URI}

image description

Now you can easily change the parameters when you want to run the pipeline in different environments:

  • You can just change the default parameters in the pipeline configuration's Parameters tab
  • You can select 'Start with Parameters' from the 'More' menu to override the default parameters
  • You can supply parameters when you start a pipeline via the REST API or CLI
  • If you're using Control Hub you can specify runtime parameters in the job definition
edit flag offensive delete link more

Comments

Re Control Hub: here's a quick YouTube video that shows how this works: https://youtu.be/UQwKOS9VNyE

Alex Woolford gravatar imageAlex Woolford ( 2018-06-01 08:36:53 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-05-31 02:32:27 -0500

Seen: 567 times

Last updated: May 31