Ask Your Question

Data duplication possibly caused by erroneous offsets

asked 2019-02-26 19:24:39 -0600

Ronald M gravatar image

Running a pipeline using JDBC Multi table consumer. Noticed that duplicates are being generated for some reason. After some investigation, I noticed that the offset.json file contained two rows for the same table.

  "offsets" : {
    "tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1545523200000",
    "tableName=kds-assembly_product_details_archive;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=business_date=1545523200000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "business_date=1550880000000",
    "$com.streamsets.pipeline.stage.origin.jdbc.table.TableJdbcSource.offset.version$" : "2",
    "tableName=sap_sales_upload;;;partitioned=false;;;partitionSequence=-1;;;partitionStartOffsets=txndate=1493596800000;;;partitionMaxOffsets=;;;usingNonIncrementalLoad=false" : "txndate=1543622400000"
  "version" : 2

How can one prevent this?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2019-03-03 23:49:29 -0600

Shruthi gravatar image

If your requirement is to avoid duplicate records in your destination, you can use a processor called ' Record Deduplicator' which removes the duplicated records. You can attach it just before your destination.

For Eg:-

image description

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-02-26 19:24:39 -0600

Seen: 175 times

Last updated: Mar 03 '19