Ask Your Question

How does offset work in StreamSets Data Collector?

asked 2018-05-27 23:37:49 -0600

Trans gravatar image

updated 2018-05-28 09:34:45 -0600

metadaddy gravatar image

Suppose I have a pipeline with Directory as a origin, some processors in between and JDBC as destination. If the pipeline crashes at some point in the processor, will SDC able to process the failed data also after the origin reads these failed data?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-05-28 09:36:38 -0600

metadaddy gravatar image

Yes, in the default 'at least once' mode, Data Collector saves the offset after the data has been successfully sent to the destination. In the case of the Directory origin, the offset is the name of the file being processed and the byte offset into that file. When the pipeline restarts, the origin reads the last offset and begins processing there.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-05-27 23:37:49 -0600

Seen: 386 times

Last updated: May 28 '18