I have pipeline1 which loads data from ftp site to hdfs and pipeline2 loads data from hdfs to hive. Pipeline2 is dependent on pipeline1. How to kickoff pipeline2 after pipeline1 finishes? What are the steps to configure? Can you give me the sample script?

With StreamSets, you might consider whether or not you really need to stop and start in these use cases. From this description, I'd assume you can run both continuously and keep things simple and less brittle, but I'll add an answer below as well.

todd gravatar imagetodd ( 2018-08-02 07:44:34 -0500 )edit

Check out which shows examples of using curl in scripts to start and stop pipelines. Although this article uses cron as a control mechanism, you should be able modify for your case.

Specifically, you could configure pipeline1 to have call shell script with curl command via pipeline Stop Event or configure your SFTP origin to Produce Events and call the shell script from Shell Executor.

But again, as mentioned in comments above. You may consider running both pipelines continuously.

