Where does Data Collector maintain offsets?

asked 2018-12-03 06:49:34 -0600

anonymous user


updated 2018-12-14 11:14:14 -0600

metadaddy gravatar image

In case of system failure with single Data Collector node, or even in the cluster if my entire cluster goes down, does StreamSeta maintain offsets in file or other storage?

1 Answer

answered 2018-12-03 17:25:51 -0600

iamontheinet gravatar image


If you are using standalone data collector, the offsets are stored in the data directory on disk -- SDC_DATA as described in the documentation. If using Control Hub with jobs, offsets are stored in the internal database for that application. For pipelines running in cluster streaming mode on either Mesos or YARN, the offsets can be stored on HDFS or AWS S3 as described here.

Cheers, Dash

Asked: 2018-12-03 06:49:34 -0600

Seen: 599 times

Last updated: Dec 14 '18