Ask Your Question

Any guide or template on how to streaming MySQL CDC to Apache Hive?

asked 2017-09-28 01:07:57 -0500

casel.chen gravatar image

I want to streaming mysql table changes by parsing BinLog to kafka (json/avro format?), and then sink to HDFS/Hive partitioned by event time in ORC format, so I can query every day changes in apache hive. Considering the throughput is not very high, how can I balance the HDFS file size and responsibility? Many Thanks!

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-01-30 00:27:56 -0500

updated 2018-01-30 00:37:47 -0500

You can refer below links for writing at hive:

For CDC you can use Mysql Binary Log as a origin.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-09-28 01:07:57 -0500

Seen: 1,226 times

Last updated: Jan 30 '18