Ask Your Question
1

Hadoop FS: different schema than is used for the current file

asked 2018-05-22 08:52:54 -0500

casel.chen gravatar image

I have a data pipeline from kafka to hive. The kafka message is json type, in pipeline I flatten entire fields and generate a new 'yyyy-MM-dd' field for partition, when run the pipeline it reported errors like the following. What's the error meaning? Why it happen and how to fix it?

image description

AVRO_GENERATOR_04 - Record sit_CREDIT_INVOCATION_HISTORY::0::3902 has a different schema than is used for the current file. Current schema is '{"type":"record","name":"credit_invocation_history_table","namespace":"report_sit","fields":[{"name":"creditscore","type":["null","double"],"default":null},{"name":"occurtime","type":["null",{"type":"int","logicalType":"date"}],"default":null},{"name":"terminal","type":["null","string"],"default":null},{"name":"userid","type":["null","long"],"default":null},{"name":"riskprocessid","type":["null","long"],"default":null},{"name":"eventcode","type":["null","string"],"default":null},{"name":"executionid","type":["null","long"],"default":null},{"name":"input_name","type":["null","string"],"default":null},{"name":"input_certno","type":["null","string"],"default":null},{"name":"productcode","type":["null","string"],"default":null},{"name":"creditstrategyid","type":["null","long"],"default":null},{"name":"creditdetail_ratevalue","type":["null","double"],"default":null},{"name":"creditdetail_creditenddate","type":["null","long"],"default":null},{"name":"creditdetail_ratetype","type":["null","string"],"default":null},{"name":"creditdetail_compoundperiod","type":["null","int"],"default":null},{"name":"creditdetail_amount","type":["null","double"],"default":null},{"name":"creditdetail_creditdecision","type":["null","string"],"default":null},{"name":"creditdetail_creditstartdate","type":["null","long"],"default":null}]}' whereas the record schema is '{"type":"record","name":"credit_invocation_history_table","namespace":"report_sit","fields":[{"name":"creditscore","type":["null","double"],"default":null},{"name":"occurtime","type":["null",{"type":"int","logicalType":"date"}],"default":null},{"name":"terminal","type":["null","string"],"default":null},{"name":"userid","type":["null","long"],"default":null},{"name":"riskprocessid","type":["null","long"],"default":null},{"name":"eventcode","type":["null","string"],"default":null},{"name":"executionid","type":["null","long"],"default":null},{"name":"input_name","type":["null","string"],"default":null},{"name":"input_certno","type":["null","string"],"default":null},{"name":"productcode","type":["null","string"],"default":null},{"name":"creditstrategyid","type":["null","long"],"default":null},{"name":"creditdetail_ratevalue","type":["null","double"],"default":null},{"name":"creditdetail_creditenddate","type":["null","long"],"default":null},{"name":"creditdetail_ratetype","type":["null","string"],"default":null},{"name":"creditdetail_compoundperiod","type":["null","int"],"default":null},{"name":"creditdetail_amount","type":["null","double"],"default":null},{"name":"creditdetail_creditdecision","type":["null","string"],"default":null},{"name":"creditdetail_creditstartdate","type":["null","long"],"default":null},{"name":"input_phone","type":["null","string"],"default":null}]}'

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-05-22 11:36:05 -0500

metadaddy gravatar image

This is happening because the schema is changing between one record and the next. You need to set the roll attribute to tell the Hadoop FS destination to close the current file and open a new one. The Hive Metadata processor will do this for you.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-05-22 08:52:54 -0500

Seen: 49 times

Last updated: May 22