Ask Your Question

Is changed data entered as a new record?

asked 2018-08-03 01:07:54 -0500

Shruthi gravatar image

updated 2018-08-09 00:15:43 -0500

My origin is Mysql Binary Log and destination is Hive. I am trying Change Data Capture(CDC). First i sent my full table data to hive and next i performed CDC. It worked fine but the changed data is entered as a new record in the table instead of overwriting the changed record with the new data

For eg:-

Imagine my table is,

1     ABC     10
2     DEF     12

Now i change 'ABC' to 'XYZ' and the CDC works like,

1     ABC    10
2     DEF    12
1     XYZ    10

This is entering as two records with same ID 1. How can i get the output like,

1      XYZ    10
2      DEF    12
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-08-09 11:58:48 -0500

sheraz Khan gravatar image

Hi, In hive you cannot update/delete the old record, it will always come as a new record. You need to add date timestamp column in the last and get the latest record using over (partition by ) query

sample query: select id, name count(*) over (partition by id order by date_column asc) as cnt from your_table;

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-08-03 01:07:54 -0500

Seen: 113 times

Last updated: Aug 09