Ask Your Question
0

Perform initial database snapshot

asked 2019-01-24 08:01:15 -0600

anonymous user

Anonymous

Dear all,

I want to use StreamSets' MySQL Binary Log Connector to integrate data from MySQL into BigQuery. The Binary Logs were not activated from the beginning and so I need first to capture an initial snapshot of the database and then start using log-based CDC. I am not sure what will be the recommended way to do this. For this reason any additional information regarding this topic would be highly appreciated.

Thank you. Best regards

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2019-01-24 10:47:37 -0600

iamontheinet gravatar image

Hi!

For the initial load, create a new/separate pipeline and use JDBC Multitable Consumer origin. Depending on number of tables and records there are in MySQL, you'll be able to tune attributes like number of threads, max number of records to include in a batch, etc. Once that's done, you can move on with your log-based CDC pipeline.

Hope this helps.

Cheers, Dash

edit flag offensive delete link more
0

answered 2019-01-25 01:33:58 -0600

MFCOM gravatar image

Hi Dash,

thank you for your answer. I thought about doing it the way you described. For me it is still not clear when to stop reading from JDBC Multitable Consumer and when to start MySQL CDC pipeline, i.e. how to configure the JDBC Multitable Consumer and the MySQL Binary Log Reader. Is there any best practice how to avoid data loss or duplicate records? I mean the CDC pipeline should just start where the JDBC Multitable Consumer ends. How will I achieve this?

Thank you. Best regards Marcel

edit flag offensive delete link more

Comments

If it is initial/one-time offload, start the first pipeline and wait till it's done migrating all data. Then start the second/CDC pipeline. Please also read through the docs for both origins. They are pretty detailed in terms of configuration, etc.

iamontheinet gravatar imageiamontheinet ( 2019-01-25 10:25:54 -0600 )edit

Yes, that's clear for me. But what is the best way to trigger the CDC pipeline just after the initial load has finished? Thank you. Marcel

MFCOM gravatar imageMFCOM ( 2019-01-30 10:35:02 -0600 )edit

Is starting the second pipeline manually not an option?

iamontheinet gravatar imageiamontheinet ( 2019-01-30 10:37:08 -0600 )edit

It would be better to get a trigger, so that I do not have to use the Binary Log Position. I want to avoid duplicate records or data loss.

MFCOM gravatar imageMFCOM ( 2019-01-30 10:40:37 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-01-24 08:01:15 -0600

Seen: 49 times

Last updated: Jan 25