Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Currently, neither Data Collector or DPM offer scheduling.  However, we have engineers currently working on this project, and we will be adding scheduling functionality soon.

Generally, we suggest that pipelines can be scheduled via cron or another utility and use the Data Collector's CLI to start and stop the pipelines. 

The documentation for Data Collector CLI is here:  https://streamsets.com/documentation/datacollector/latest/help/index.html#Administration/Administration_title.html#concept_ywx_d5x_pt

As a reminder, the columns in a crontab entry are:

Minutes - 0-59. 

Hour 0-23.

Day of month 1-31

Month of year 1-12

Day of week 0-6 (0 is Sunday) 

the command to execute.

To run a pipeline on weekdays, at 1:00 am, your crontab entry might look like this: 

00 01 * * 1-5 bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db Start the pipeline  at 1:00 and run the pipeline Monday through Friday.  Replace 1-5 with * to include weekends.

Depending on your environment, you will likely need to adjust the path above.  Perhaps writing a wrapper script that correctly sets the shell's environment and can start an arbitrary pipeline will make it more manageable.

Currently, neither Data Collector or DPM offer scheduling.  However, we have engineers currently working on this project, and we will be adding scheduling functionality soon.

Generally, we suggest that pipelines Pipelines can be scheduled via cron or another utility and use the Data Collector's CLI to start and stop the pipelines. 

The documentation for Data Collector CLI is here:  https://streamsets.com/documentation/datacollector/latest/help/index.html#Administration/Administration_title.html#concept_ywx_d5x_pt

As a reminder, the columns in a crontab entry are:

Minutes - 0-59. 

Hour 0-23.

Day of month 1-31

Month of year 1-12

Day of week 0-6 (0 is Sunday) 

the command to execute.

To run a pipeline on weekdays, at 1:00 am, your crontab entry might look like this: 

00 01 * * 1-5 bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db Start the pipeline  at 1:00 and run the pipeline Monday through Friday.  Replace 1-5 with * to include weekends.

Depending on your environment, you will likely need to adjust the path above.  Perhaps writing a wrapper script that correctly sets the shell's environment and can start an arbitrary pipeline will make it more manageable.

Pipelines can be scheduled via cron cron or another utility and use the Data Collector's CLI to start and stop the pipelines. 

The documentation for Data Collector CLI is here:  https://streamsets.com/documentation/datacollector/latest/help/index.html#Administration/Administration_title.html#concept_ywx_d5x_pt

As a reminder, the columns in a crontab crontab entry are:

  • Minutes - 0-59. 

  • Hour 0-23.

  • Day of month 1-31

  • Month of year 1-12

  • Day of week 0-6 (0 is Sunday) 

    the

  • The command to execute.

To run a pipeline on weekdays, at 1:00 am, your crontab entry might look like this: 

00 01 * * 1-5 bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db

Start the pipeline  at 1:00 and run the pipeline Monday through Friday.  Replace 1-5 with * to include weekends.

Depending on your environment, you will likely need to adjust the path above.  Perhaps writing a wrapper script that correctly sets the shell's environment and can start an arbitrary pipeline will make it more manageable.