Data duplication possibly caused by erroneous offsets
Running a pipeline using JDBC Multi table consumer. Noticed that duplicates are being generated for some reason. After some investigation, I noticed that the offset.json file contained two rows for the same table.
{
"offsets" : {
"tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1545523200000",
"tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=business_date=1545523200000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1550880000000",
"$com.streamsets.pipeline.stage.origin.jdbc.table.TableJdbcSource.offset.version$" : "2",
"tableName=sap_sales_upload;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=txndate=1493596800000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "txndate=1543622400000"
},
"version" : 2
}
How can one prevent this?
add a comment