Ask Your Question

Any guide or template on how to streaming MySQL CDC to Apache Hive?

asked 2017-09-28 01:07:57 -0600

casel.chen gravatar image

I want to streaming mysql table changes by parsing BinLog to kafka (json/avro format?), and then sink to HDFS/Hive partitioned by event time in ORC format, so I can query every day changes in apache hive. Considering the throughput is not very high, how can I balance the HDFS file size and responsibility? Many Thanks!

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-01-30 00:27:56 -0600

updated 2018-01-30 00:37:47 -0600

You can refer below links for writing at hive:

For CDC you can use Mysql Binary Log as a origin.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-09-28 01:07:57 -0600

Seen: 1,567 times

Last updated: Jan 30 '18