Ask Your Question

Any idea on how to use the committed offsets from one pipeline...

asked 2018-08-22 14:38:06 -0600

Jmbertoncelli gravatar image

updated 2018-08-29 09:18:23 -0600

metadaddy gravatar image

Any idea on how to use the committed offsets from one pipeline as initial offset of another pipeline?

I have a first pipeline that does ingest a table or a full data base and then when done the pipeline stop and another pipeline would start and looks for changes but this can work only if the initial offset is correct in relationship where the pipeline up stream did stop.

edit retag flag offensive close merge delete


Edited your text into your question, since it wasn't an answer. I'm not following the problem here. If the first pipeline gets all data and stops, it's offset should be correct for the second pipeline to pick up, shouldn't it?

metadaddy gravatar imagemetadaddy ( 2018-08-29 09:19:42 -0600 )edit

yes it is what I would like to know. Thank you.

Jmbertoncelli gravatar imageJmbertoncelli ( 2018-08-29 10:08:44 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2018-08-27 17:12:16 -0600

metadaddy gravatar image

This is possible, but not recommended or supported! Data Collector stores the offset for each pipeline in a JSON file at ${SDC_DATA}/runInfo/<pipeline id>/0/offset.json. Here's an example for a pipeline that uses the Directory origin:

  "version" : 2,
  "offsets" : {
    "$com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.offset.version$" : "1",
    "nyc_taxi_data_01.csv" : "{\"POS\":\"-1\"}"

You can copy this file from one pipeline's runInfo directory to another to set the offset for the 'target' pipeline.

You can also retrieve the same information via Data Collector's REST API. A GET request to /rest/v1/pipeline/<pipeline id>/committedOffsets will return the same JSON as above. You can then POST the JSON to the equivalent endpoint for the target pipeline.

Although these techniques will work, please note that they are not supported. Offsets are considered Data Collector 'internals' and are not a public interface.

Control Hub will let you view the last-saved offset for a job, but does not allow you to modify it.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-08-22 14:38:06 -0600

Seen: 722 times

Last updated: Aug 29 '18