Perform initial database snapshot

asked 2019-01-24 08:01:15 -0600

Dear all,

I want to use StreamSets' MySQL Binary Log Connector to integrate data from MySQL into BigQuery. The Binary Logs were not activated from the beginning and so I need first to capture an initial snapshot of the database and then start using log-based CDC. I am not sure what will be the recommended way to do this. For this reason any additional information regarding this topic would be highly appreciated.

Thank you. Best regards

2 Answers

answered 2019-01-24 10:47:37 -0600

For the initial load, create a new/separate pipeline and use JDBC Multitable Consumer origin. Depending on number of tables and records there are in MySQL, you'll be able to tune attributes like number of threads, max number of records to include in a batch, etc. Once that's done, you can move on with your log-based CDC pipeline.

Hope this helps.

Cheers, Dash

answered 2019-01-25 01:33:58 -0600

Hi Dash,

thank you for your answer. I thought about doing it the way you described. For me it is still not clear when to stop reading from JDBC Multitable Consumer and when to start MySQL CDC pipeline, i.e. how to configure the JDBC Multitable Consumer and the MySQL Binary Log Reader. Is there any best practice how to avoid data loss or duplicate records? I mean the CDC pipeline should just start where the JDBC Multitable Consumer ends. How will I achieve this?

Thank you. Best regards Marcel

If it is initial/one-time offload, start the first pipeline and wait till it's done migrating all data. Then start the second/CDC pipeline. Please also read through the docs for both origins. They are pretty detailed in terms of configuration, etc.

Yes, that's clear for me. But what is the best way to trigger the CDC pipeline just after the initial load has finished? Thank you. Marcel

Is starting the second pipeline manually not an option?

It would be better to get a trigger, so that I do not have to use the Binary Log Position. I want to avoid duplicate records or data loss.

Asked: 2019-01-24 08:01:15 -0600

