Ask Your Question
1

PostgreSQL CDC to Snowflake - Error

asked 2020-04-07 08:46:42 -0500

gprasaad gravatar image

updated 2020-04-09 13:33:42 -0500

metadaddy gravatar image

Hello,

I have a simple pipeline to capture CDC data from PostgreSQL (i have configured WAL-related settings) to send to Snowflake. I can see that the WAL2JSON is generating the CDC records and sending over to Snowflake but i am getting the error in the Snowflake stage - "SNOWFLAKE_24 - Record does not have CDC operation" - possibly because PostgreSQL CDC doesn't generate the same headers Snowflake Destination is looking for.

I was expecting this to work out of the box - What should i do? write a converter? or use some other in-stream processor?

I am running StreamSets Data Collector 3.13.0 locally for testing.

Here is the example output from the PostgreSql CDC Origin:

{
"xid": 1428432,
"nextlsn": "57/A6006F00",
"timestamp": "2020-04-07...",
"change": [
{
"kind": "update",
"schema": "public",
"table": "...",
"columnnames": [....],
"columntypes":[...]
"columnvalues":[...],
"oldkeys: ...
}

Please help. All i need to know is if PostgreSQL-CDC -> Snowflake is expected to work out of the box if i configure it correctly.

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2020-04-09 13:32:56 -0500

metadaddy gravatar image

Unfortunately, ingesting PostgreSQL CDC data is not quite that simple. From the docs:

Since each data change transaction can include multiple CRUD operations, the PostgreSQL CDC Client origin can also include multiple operations in a record. As a result, the origin does not write the CRUD operations to the sdc.operation.type record header attribute. Depending on your use case, you might use a scripting processor to convert the records as needed. Or, you might use a Field Pivoter and other processors to separate the data to create a record for each operation.

You will need to use the Field Pivoter and other processors to transform the data into a suitable form for Snowflake. You might find this blog entry useful: Transform Data in StreamSets Data Collector

edit flag offensive delete link more

Comments

thank you - your suggestion did pave the way for me to figure out a solution to my problem - it involved Raw Data (DEV) -> Pivoter -> Express. Eval -> Remover -> Flattener -> Renamer -? JDBC Producer (instead of Snowflake CDC destination).

gprasaad gravatar imagegprasaad ( 2020-04-13 12:17:13 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-04-07 08:46:42 -0500

Seen: 81 times

Last updated: Apr 09