How do multiple offsets used in JDBC Multitable Consumer function?

I am using two offset fields for my JDBC Multitable Consumer origin in one of my streaming pipelines. The two offset fields I have chosen to override my primary key are an userID (unique string) field and lastmodified (datetime) field.

My goal with this pipeline is to write records anytime a new userID is entered, or anytime a previous record has its lastmodified field updated to the present date.

Will this work with both of these fields set as offsets? From what I've seen, I cannot just use the lastmodified datetime field because duplicate records are created in the pipeline if the timestamps occur at the same time for record entry (this happens frequently). This is why I added the unique userID as an additional offset to help keep those records separate. But if only the lastmodified field changes (> than the offset), will this record be captured and written?

Thanks for any help provided!

edit retag close merge delete

Sort by » oldest newest most voted

Suppose there are two offset columns, A and B. The multitable consumer will effectively execute queries like the following:

SELECT * FROM table WHERE A > x OR (A == x AND B > y)

where x is the committed offset value for column A, and y is the committed offset value for column B. Given these semantics, and what you've described, I think you should be able to specify the offset columns as lastmodified, userID (in that order). This way, the origin will ensure all values are read for a given lastmodified value (which presumably should include all userID values). The main thing to watch out for is ensuring that all of the userID records for a given lastmodified value show up, before a new value for lastmodified is inserted. This would obviously depend on whatever system is inserting data into the table in question.

more