Ask Your Question

Hive metadata - partition shows twice

asked 2017-06-02 03:08:23 -0500

gvd gravatar image

updated 2017-06-02 03:50:27 -0500

We have the following pipeline configuration on ; image description

The table create statement : CREATE TABLE mytable (fields) partitioned by (MONTH_CAL_ID INT) stored as avro;

When the table is created, we only have one month_cal_id column.

After a first pipeline run, a second column month_cal_id shows up.

While this isn't an issue when querying with hive, spark throws an error : Reference 'MONTH_CAL_ID' is ambiguous, could be: MONTH_CAL_ID#69, MONTH_CAL_ID#70

Any suggestions ?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted

answered 2017-06-02 07:56:26 -0500

updated 2017-06-27 17:52:02 -0500

LC gravatar image

I believe that the confusion is coming from the fact that the /MONTH_CAL_ID is present in the record itself - and hence considered to be part of normal column names and then at the same time used as a partition column as well.

I would recommend to "move" the /MONTH_CAL_ID from record to its header using the Expression Evaluator and then drop the /MONTH_CAL_ID from the record itself (using the Field Remover). And finally changing the partition value expression to ${record:attribute('partition')} (or similar).

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-06-02 03:08:23 -0500

Seen: 1,621 times

Last updated: Jun 27 '17