Dealing with bulk updates from Salesforce CDC

asked 2020-02-25 12:52:34 -0500

updated 2020-03-02 11:58:04 -0500

I'm brand new with StreamSets, and I'm trying to create a stream with Salseforce CDC as it's source. I did a basic flow that sends each record to a different file destination based on the fields updated. That works fine if a single row is updated but not for bulk updates.

When I do a bulk update of multiple rows in Salesforce (a common scenario for us), the LastModifiedDate value from the updated records comes in to StreamSets as its own record. For example, if I update the Name field on three Account records, I get four rows coming into StreamSets: the first row has the LastModifiedDate and the next three have the changes to the Name field:

{"LastModifiedDate":1583171376000}
{"Name":"Robert"}
{"Name":"Test"}
{"Name":"Sally"}

How can I efficiently get the LastModifiedDate back into all the other records so I can create one JSON record for each updated Salesforce record?

edit retag flag offensive close merge delete

Comments

Can you connect the Salesforce origin to something like Local FS destination with JSON data format so I can visualize what you're seeing? Please edit your question to include the output data. Thanks!

metadaddy gravatar imagemetadaddy ( 2020-02-25 15:04:15 -0500 )edit

Sorry. Yes. I've modified the question to show that. I had wanted to add an image when I created the question, but apparently I did't have enough karma points at the time.

asturt gravatar imageasturt ( 2020-03-02 11:54:44 -0500 )edit

That is very strange. What do you have in your pipeline? When I try something similar, I see LastModifiedDate on each record. According to the Salesforce CDC docs, LastModifiedDate should be present on EVERY update notification.

metadaddy gravatar imagemetadaddy ( 2020-03-02 13:16:44 -0500 )edit

My pipeline is a Salesforce connector and a local file system destination. That's it. No transformations. I only get the LastModifiedDate on a separate record when I do a bulk update in Salesforce. If I update one record manually, I this JSON: {"LastModifiedDate":1583171797000,"Name":"Bobby"}

asturt gravatar imageasturt ( 2020-03-02 14:01:17 -0500 )edit

Just to be clear, I'm not saying the LastModifiedDate is not present in the update notification. I'm saying that in the case of a bulk update the notification includes multiple records, one of which provides the LastModifiedDate for all the other records in the notification.

asturt gravatar imageasturt ( 2020-03-02 14:33:45 -0500 )edit