If you are using standalone data collector, the offsets are stored in the data directory on disk -- SDC_DATA as described in the documentation. If using Control Hub with jobs, offsets are stored in the internal database for that application. For pipelines running in cluster streaming mode on either Mesos or YARN, the offsets can be stored on HDFS or AWS S3 as described here.