Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Data duplication possibly caused by erroneous offsets

Running a pipeline using JDBC Multi table consumer. Noticed that duplicates are being generated for some reason. After some investigation, I noticed that the offset.json file contained two rows for the same table.

{
  "offsets" : {
    "tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1545523200000",
    "tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=business_date=1545523200000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1550880000000",
    "$com.streamsets.pipeline.stage.origin.jdbc.table.TableJdbcSource.offset.version$" : "2",
    "tableName=sap_sales_upload;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=txndate=1493596800000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "txndate=1543622400000"
  },
  "version" : 2
}

How can one prevent this?