Ask Your Question
1

Any guide or template on how to streaming MySQL CDC to Apache Hive?

asked 2017-09-28 01:07:57 -0500

casel.chen gravatar image

I want to streaming mysql table changes by parsing BinLog to kafka (json/avro format?), and then sink to HDFS/Hive partitioned by event time in ORC format, so I can query every day changes in apache hive. Considering the throughput is not very high, how can I balance the HDFS file size and responsibility? Many Thanks!

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-01-30 00:27:56 -0500

updated 2018-01-30 00:37:47 -0500

You can refer below links for writing at hive: https://github.com/streamsets/tutoria...

For CDC you can use Mysql Binary Log as a origin. https://streamsets.com/documentation/...

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-09-28 01:07:57 -0500

Seen: 268 times

Last updated: Jan 30