Any guide or template on how to streaming MySQL CDC to Apache Hive?

asked 2017-09-28

casel.chen

I want to streaming mysql table changes by parsing BinLog to kafka (json/avro format?), and then sink to HDFS/Hive partitioned by event time in ORC format, so I can query every day changes in apache hive. Considering the throughput is not very high, how can I balance the HDFS file size and responsibility? Many Thanks!

1 Answer

answered 2018-01-30

updated 2018-01-30 00:37:47 -0500

You can refer below links for writing at hive:

For CDC you can use Mysql Binary Log as a origin.

Asked: 2017-09-28

Last updated: Jan 30 '18