Ask Your Question
1

JDBC Producer CDC - Change Log Format Error

asked 2018-03-13 17:35:25 -0500

Eilliar gravatar image

updated 2018-03-14 09:30:01 -0500

The idea behind my pipeline is to reflect changes from a MySQL to a PostgreSQL DB. In the future I'll also have a Oracle to PostgreSQL replication.

So, from this forum and SDC documentation, I saw that the right way to do it is to use a CDC origin. So I'm using a MySQL Binary Log. I was able to build a pipeline that process the 3 CRUD operations (INSERT, DELETE, UPDATE), but it uses several processors (Field remover, flattener, stream selector, field renamer and so on).

From what I saw in the config of the JDBC Producer, this destination should be able to process MySQL Binary Log directly from a Stream that reads from a MySQL Binary log Origin, right? Just setting the Change Log Format in the JDBC Producer to MySQL Binary Log.

But even though I do this, the pipeline runs with no error, but the data is NOT changed in the PostgreSQL destination.

Am I missing something? Is it necessary to process the stream from the MySQL Binary Log origin before sending it to a JDBC Producer? If so, what must be done?

I'd attach the print from my pipelines, but it seems that I don't have enough points yet.

edit retag flag offensive close merge delete

3 Answers

Sort by ยป oldest newest most voted
1

answered 2018-03-14 21:40:37 -0500

junko_urata gravatar image

You are right that JDBC Producer can process CDC records directly from MySQL Binlog Origin. What kind of records do you see when you run preview or take snapshot? Also do you see INSERT, DELETE, UPDATE in sdc.log??

edit flag offensive delete link more
0

answered 2018-07-13 10:14:39 -0500

todd gravatar image

Sounds like you may be experiencing https://issues.streamsets.com/browse/...

edit flag offensive delete link more
0

answered 2018-03-19 09:56:43 -0500

Eilliar gravatar image

updated 2018-03-19 10:03:13 -0500

So, when I run a preview the records are like this:

{
  "BinLogFilename": "mysql-bin-changelog.002346",
  "Type": "UPDATE",
  "Table": "regions",
  "ServerId": 4792629,
  "BinLogPosition": 411,
  "Database": "test_db",
  "OldData": {
    "region_id": 7,
    "region_name": "The Shire"
  },
  "Data": {
    "region_id": 7,
    "region_name": "Beleriand"
  },
  "Timestamp": 1520974070000,
  "Offset": "mysql-bin-changelog.002346:411"
}

Also, from the record header I can check the sdc.operation.type: 3 (for the above record), which perfectly matches all the CRUD operations, according to the SDC documentation.

I checked the sdc.log for the INSERT, DELETE and UPDATE keywords, but wasn't able to find it in this log, don't know if it is logging all the operations/records.

edit flag offensive delete link more

Comments

At DEBUG level, you;ll see "Generated single-row query:" or "Generated multi-row query:" with SQL commands and {} {} as parameters. Unfortunately PostgreSQL does not show the actual queries in log. One thing you can try is use MySQL as destination DB to see the queries and run them on postgreDB.

junko_urata gravatar imagejunko_urata ( 2018-05-22 15:30:20 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-03-13 17:35:25 -0500

Seen: 216 times

Last updated: Jul 13