CDC Postgres- Type mismatch (Expected Float, actual:INT)

asked 2019-09-23 10:39:01 -0500

Torkia Boussada gravatar image

updated 2019-09-27 13:19:53 -0500

metadaddy gravatar image

Hello, We have a pipeline (001 Load All VDM-Public Tables - Initial Load ) to do pre-load of tables from a Postgres database called vdm TO Hive database

Once pre-load completed, we have created a replication slot on vdm database using the below statement:

select 'init' from pg_create_logical_replication_slot('vdm_slot','wal2json')

Then we have another pipeline (VDM- Main Postgres CDC Pipeline) to capture the data changes and transfer them to edh

 When reading the changes via replicated slots, the numeric fields (example field qty_vl , type on postgres source database is numeric(4,0)) are read as INT  (example value=1). Data is not transferred to EDH because the expected type is Float. CDC Pipeline (VDM- Main Postgres CDC Pipeline) is giving type mismatch error at stage Hive Metadata 1 (error message: expect Float, actual:int). The reason is that the field has been created with Float type during the pre-load.

Question: We are trying to find a generic way to be able to convert the data type for certain fields from the source (instead of doing it by Field Name: see stage Field Type Converter 2). Also, if new fields are added in the future then the above will take care of new fields without making changes to the pipeline.

Pipelines attached: changed extension from .zip to .jpg Postgres database version is:9.5.14.  Streamsets version is: 3.7.1

Thanks in advance.

C:\fakepath\Field converter 1- 2019-09-27_9-11-48.png

abc.png

edit retag flag offensive close merge delete

Comments

Did you try changing the Field Type Converter Conversion Method to 'By Data Type'?

metadaddy gravatar imagemetadaddy ( 2019-09-24 10:03:48 -0500 )edit

Yes, I'm using Field Type Converter Conversion Method to 'By Data Type', (DECIMAL to FLOAT , DATE to STRING, DOUBLE to FLOAT).

Torkia Boussada gravatar imageTorkia Boussada ( 2019-09-27 08:44:19 -0500 )edit

The problem is that when doing full load with a streamset pipeline (JDBC Multi table), fields that are numeric(4,0) at Postgres source table, they have been created as float. But when capturing the changes through CDC Postgres stage(another pipeline), these fields type is captured are int.

Torkia Boussada gravatar imageTorkia Boussada ( 2019-09-27 08:46:04 -0500 )edit

I see. I think the JDBC Multitable consumer and PostgreSQL processor should behave consistently. Please file an issue at https://issues.streamsets.com/ . If you are a customer, you can open a support ticket.

metadaddy gravatar imagemetadaddy ( 2019-09-27 13:22:18 -0500 )edit

In the Jython Evaluator, I have added some code to convert numeric fields to float, bigint to long. fieldtype = record.value['change']['columntypes'][x]

Torkia Boussada gravatar imageTorkia Boussada ( 2019-10-02 14:25:25 -0500 )edit