Ask Your Question

Can't get proper data type of missing value field

asked 2017-11-03 05:47:46 -0500

casel.chen gravatar image

updated 2017-11-06 11:22:22 -0500

metadaddy gravatar image

I am doing replicate data from MySQL to Hive. There is a column named "loan_type" defined as INT without default value in MySQL, there are some rows don't have values, that is, null. When run StreamSets DC pipeline it reported the following exception, how to work around it without hard code column name because there are other unknown columns have null value?

com.streamsets.pipeline.api.base.OnRecordErrorException: HIVE_21 - Type Mismatch for column 'loan_type', Expected: INT, Actual: STRING

image description

edit retag flag offensive close merge delete


Any workaround solution suggest? The normal data looks like {"loan_type": 1}, while there are some rows doesn't have loan_type value, so the json looks like {"loan_type": ""}, when sink data into hive metadata, it can't determine which data type should it be, right? Should the data type be determine

casel.chen gravatar imagecasel.chen ( 2017-11-05 22:05:51 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted

answered 2017-11-06 10:27:06 -0500

atka gravatar image

updated 2017-11-06 11:18:43 -0500

metadaddy gravatar image

You could try using a scripting processor to convert the fields to NULL_INTEGER eventually other NULL constants:

Available constants - they are to assign a type to a field with a value null.


For example, with Groovy evaluator, you can convert the fields with NULL value from STRING to INTEGER data type as follows:

if(record.value['loan_type'] == null)
    record.value['loan_type'] = NULL_INTEGER


if(sdcFunctions.getFieldNull(record, '/loan_type') == NULL_STRING)
    record.value['loan_type'] = NULL_INTEGER
edit flag offensive delete link more


Great, it works! But I wonder why streamsets dc can't get json schema from MySQL table description?

casel.chen gravatar imagecasel.chen ( 2017-11-06 20:12:34 -0500 )edit

The JDBC origin in the SDC can actually do that. I verified it and I created a MySQL table and added NULL value to INT column. The SDC shows the datatype of the field as INTEGER, even for the field with NULL value. If you run a preview, you can verify which stage converts NULL value to STRING.

atka gravatar imageatka ( 2017-11-06 21:05:15 -0500 )edit

I temp removed Groovy Evaluator and connect JDBC origin to Hive Metadata directly, then click preview button but failed with the same error message. I also notice the output of "loan_type" from JDBC origin is correct as INT, so I think the Hive Metadata reports the issue.

casel.chen gravatar imagecasel.chen ( 2017-11-06 21:35:23 -0500 )edit

I tried the same with SDC 2.7.2 and it was working fine for me. What version of SDC are you using?

atka gravatar imageatka ( 2017-11-09 14:48:43 -0500 )edit


casel.chen gravatar imagecasel.chen ( 2017-11-09 18:03:38 -0500 )edit

answered 2018-12-11 12:38:43 -0500

metadaddy gravatar image

Another solution is to use the Field Remover processor with a field path expression to remove all fields with a null value.

Set the Field Remover's Action to 'Remove Listed Fields' and Fields to /*[${f:value() == null}]

image description

Now any fields with a null value will be removed, the Metadata Processor will no longer report a type mismatch, and Hive will set the missing columns to null as the data is inserted.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-11-03 05:47:46 -0500

Seen: 981 times

Last updated: Dec 11 '18