Hive query raised java.io.IOException: Not a data file

asked 2018-07-10 02:27:25 -0600

casel.chen gravatar image

updated 2018-07-10 08:09:23 -0600

I setup two pipelines to move data from kafka to hive. Used "hive metadata (D)", "hive metastore" and "hadoop fs (D)" components. Then I found one of pipeline generated hive table can NOT be queried and reported "java.io.IOException: Not a data file", but the other one is queryable. The pipeline structure of two is almost the same, why? And where is the avro schema file located in hive destination? I can't find something like TBLPROPERTIES ( 'avro.schema.url'='http://schema_provider/kst.avsc') statement in create table script.

image description

image description

image description

hive> show create table cancelled_plan_repayment_table; OK CREATE EXTERNAL TABLE cancelled_plan_repayment_table(
certno string COMMENT '',
productcode string COMMENT '',
phone string COMMENT '', name string COMMENT '', eventtime string COMMENT '', loanendtype string COMMENT '', terminal string COMMENT '', riskprocessid bigint COMMENT '',
cancelledplanrepayment_planrepaymentinterest double COMMENT '',
cancelledplanrepayment_planrepaymentcapital double COMMENT '',
cancelledplanrepayment_planrepaymentamount double COMMENT '',
cancelledplanrepayment_planrepaymenttime date COMMENT '', creditstrategyid bigint COMMENT '', creditscore float COMMENT '',
creditdetail_creditdecision string COMMENT '') PARTITIONED BY (
tenantid string COMMENT '', dt string COMMENT '') ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
'hdfs://ambari-agent1.sit.geerong.com:9000/charon/report/dev/cancelled_plan_repayment_table' TBLPROPERTIES (
'last_modified_by'='anonymous',
'last_modified_time'='1527731287',
'transient_lastDdlTime'='1527731287') Time taken: 0.371 seconds, Fetched: 31 row(s)

hive> select * from cancelled_plan_repayment_table limit 3; OK 362501526972249811 TEST 13972249811 钱1526972249811 2018-05-16 14:57:26 Completed GENERAL 1234567103 247.3086169553595 2242.142477096492 2489.451094051851 2018-05-17 NULL NULL NULL 29 2018-05-17 362501526972249484 TEST 15972249484 孙1526972249484 2018-05-16 14:57:27 Completed IOS 1234567123 192.63590681258972 2141.376646556727 2334.0125533693167 2018-05-17 NULL NULL NULL 29 2018-05-17 362501526972252013 TEST 13972252013 赵1526972252013 2018-05-16 14:57:27 EarlyPayoff ANDROID 1234567123 107.8744260369838 2417.085104155828 2524.959530192812 2018-05-17 NULL NULL NULL 29 2018-05-17 Time taken: 0.162 seconds, Fetched: 3 row(s)

hive> select * from risk_invocation_history_table limit 3; OK Failed with exception java.io.IOException:java.io.IOException: Not a data file. Time taken: 0.25 seconds

hive> show create table risk_invocation_history_table; OK CREATE EXTERNAL TABLE risk_invocation_history_table(
occurtime date COMMENT '',
strategysetname string COMMENT '',
creditstrategyrcid bigint COMMENT '', riskprocessid bigint COMMENT '',
output_indivdevicegeolongitude__role__applicant string COMMENT '',
output_indivipaddress__role__applicant string COMMENT '',
output_indivphone__role__applicant string COMMENT '',
output_indivdevicegeolatitude__role__applicant string COMMENT '',
output_individ__role__applicant string COMMENT '',
output_indivname__role__applicant string COMMENT '',
creditdetail_ratevalue double COMMENT '',
creditdetail_creditenddate bigint COMMENT '', creditdetail_ratetype string COMMENT '',
creditdetail_compoundperiod int COMMENT '', creditdetail_amount double COMMENT '',
creditdetail_creditdecision string COMMENT '',
creditdetail_creditstartdate bigint COMMENT '', strategysetid bigint COMMENT '', eventname string COMMENT '', eventid bigint COMMENT '', creditscore double COMMENT '', fraudscore double COMMENT '', decision string COMMENT '', productid bigint COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_0_rulename string COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_0_ruleid bigint COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_0_ruletemplateid bigint COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_0_ruledescription string COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_1_rulename string COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_1_ruleid bigint COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_1_ruletemplateid bigint COMMENT '',
antifrauddetail_hitrulesets_0_hitrules_1_ruledescription string COMMENT '',
antifrauddetail_hitrulesets_0_returnmsg string COMMENT '',
antifrauddetail_hitrulesets_0_decision string COMMENT '',
antifrauddetail_hitrulesets_0_rulesetname string COMMENT '',
antifrauddetail_hitrulesets_0_rulesetmode string COMMENT '',
antifrauddetail_hitrulesets_0_rulesetid bigint COMMENT '',
antifrauddetail_hitrulesets_1_hitrules_0_rulename string COMMENT '',
antifrauddetail_hitrulesets_1_hitrules_0_ruleid bigint COMMENT '',
antifrauddetail_hitrulesets_1_hitrules_0_ruletemplateid bigint COMMENT '',
antifrauddetail_hitrulesets_1_hitrules_0_ruledescription string COMMENT '',
antifrauddetail_hitrulesets_1_returnmsg string COMMENT '',
antifrauddetail_hitrulesets_1_decision string COMMENT '',
antifrauddetail_hitrulesets_1_rulesetname string COMMENT '',
antifrauddetail_hitrulesets_1_rulesetmode string COMMENT '',
antifrauddetail_hitrulesets_1_rulesetid bigint COMMENT '',
antifrauddetail_antifrauddecision string COMMENT '',
antifrauddetail_decisionreason string COMMENT '', terminal string COMMENT '', userid bigint COMMENT '',
admissiondetail_admissiondecision string COMMENT '',
admissiondetail_hitrulesets_0_hitrules_0_rulename string COMMENT '',
admissiondetail_hitrulesets_0_hitrules_0_ruleid bigint COMMENT '',
admissiondetail_hitrulesets_0_hitrules_0_ruletemplateid bigint COMMENT '',
admissiondetail_hitrulesets_0_hitrules_0_ruledescription string COMMENT '',
admissiondetail_hitrulesets_0_returnmsg string COMMENT '',
admissiondetail_hitrulesets_0_decision string ...

(more)
edit retag flag offensive close merge delete

Comments

Is your 'Data Format' tab in 'Hadoop FS' Avro? If not set Data Format : Avro , Avro Schema Location: In Record Header. Also in 'Output Files' tab specify your hive warehouse path in 'Directory Template' along with your tablename(eg;- /user/hive/warehouse/table_name),then data will come to your hive

Shruthi gravatar imageShruthi ( 2018-07-10 07:41:59 -0600 )edit

The 'Data Format' in 'Hadoop FS' is JSON. I added more screenshots. I verified HDFS json file generated, it seems no problem.

casel.chen gravatar imagecasel.chen ( 2018-07-10 08:08:01 -0600 )edit

You want it in JSON only? If not try the method once which i suggested. This will give the data in your hive table(as table) as well as in hdfs(avro format)

Shruthi gravatar imageShruthi ( 2018-07-10 08:21:19 -0600 )edit

Not every record has the same data structure, one may missed fields A,B and the other may missed C,D. So setting Avro Schema Location in Record Header is not work for me. And I don't want to hard code avro schema in pipeline configuration because the avro schema maybe evolved.

casel.chen gravatar imagecasel.chen ( 2018-07-10 08:28:37 -0600 )edit

I found the exception from Hadoop Fs when configured to use with confluent schema registry: DATA_FORMAT_201 - Cannot create the parser factory: java.lang.RuntimeException: Could not create DataFactory instance for 'com.streamsets.pipeline.lib.generator.avro.AvroDataGeneratorFactory': java.lang.NoCla

casel.chen gravatar imagecasel.chen ( 2018-07-10 10:19:01 -0600 )edit