Ask Your Question
1

Can't get proper data type of missing value field

asked 2017-11-03 05:47:46 -0500

casel.chen gravatar image

updated 2017-11-06 11:22:22 -0500

metadaddy gravatar image

I am doing replicate data from MySQL to Hive. There is a column named "loan_type" defined as INT without default value in MySQL, there are some rows don't have values, that is, null. When run StreamSets DC pipeline it reported the following exception, how to work around it without hard code column name because there are other unknown columns have null value?

com.streamsets.pipeline.api.base.OnRecordErrorException: HIVE_21 - Type Mismatch for column 'loan_type', Expected: INT, Actual: STRING

image description

edit retag flag offensive close merge delete

Comments

Any workaround solution suggest? The normal data looks like {"loan_type": 1}, while there are some rows doesn't have loan_type value, so the json looks like {"loan_type": ""}, when sink data into hive metadata, it can't determine which data type should it be, right? Should the data type be determine

casel.chen gravatar imagecasel.chen ( 2017-11-05 22:05:51 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2017-11-06 10:27:06 -0500

atka gravatar image

updated 2017-11-06 11:18:43 -0500

metadaddy gravatar image

You could try using a scripting processor to convert the fields to NULL_INTEGER eventually other NULL constants:

Available constants - they are to assign a type to a field with a value null.

  • NULL_BOOLEAN, NULL_CHAR, NULL_BYTE, NULL_SHORT, NULL_INTEGER, NULL_LONG
  • NULL_FLOAT, NULL_DOUBLE, NULL_DATE, NULL_DATETIME, NULL_TIME, NULL_DECIMAL
  • NULL_BYTE_ARRAY, NULL_STRING, NULL_LIST, NULL_MAP

For example, with Groovy evaluator, you can convert the fields with NULL value from STRING to INTEGER data type as follows:

if(record.value['loan_type'] == null)
    record.value['loan_type'] = NULL_INTEGER

or

if(sdcFunctions.getFieldNull(record, '/loan_type') == NULL_STRING)
    record.value['loan_type'] = NULL_INTEGER
edit flag offensive delete link more

Comments

Great, it works! But I wonder why streamsets dc can't get json schema from MySQL table description?

casel.chen gravatar imagecasel.chen ( 2017-11-06 20:12:34 -0500 )edit

The JDBC origin in the SDC can actually do that. I verified it and I created a MySQL table and added NULL value to INT column. The SDC shows the datatype of the field as INTEGER, even for the field with NULL value. If you run a preview, you can verify which stage converts NULL value to STRING.

atka gravatar imageatka ( 2017-11-06 21:05:15 -0500 )edit

I temp removed Groovy Evaluator and connect JDBC origin to Hive Metadata directly, then click preview button but failed with the same error message. I also notice the output of "loan_type" from JDBC origin is correct as INT, so I think the Hive Metadata reports the issue.

casel.chen gravatar imagecasel.chen ( 2017-11-06 21:35:23 -0500 )edit

I tried the same with SDC 2.7.2 and it was working fine for me. What version of SDC are you using?

atka gravatar imageatka ( 2017-11-09 14:48:43 -0500 )edit

SDC 2.7.1.0

casel.chen gravatar imagecasel.chen ( 2017-11-09 18:03:38 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-11-03 05:47:46 -0500

Seen: 557 times

Last updated: Nov 06 '17