How does JDBC Multitable Consumer will know which table in hive it should feed ?

asked 2017-10-10 12:29:48 -0500

Roh gravatar image

I need to read data from 7 tables in Redshift database through JDBC multi-table consumer and write it to HDFS/Hive.

JDBC MultiTable consumer has the 7 table configs with schema and table name, I didn't mention the Offset column name, From my understanding, it will pick up the primary key from the Redshift tables, Is that true? how will it know the primary keys if I don't mention them? Does this work in composite keys case as well?

My second question how does JDBC Multitable consumer will know which table it has to write in Hive? Tables in hive schema are with the same table name is that the factor ? is there any special parameter we have to set to enable that if the table name is different in the hive?

answered 2017-10-10 14:58:29 -0500

jeff gravatar image

Yes, if the tables have primary keys, then those will be used to track progress (i.e. as the offset columns). Composite keys (up to 3 columns) are supported, but multithreaded partition processing is only supported in the cases where there is a single numerically typed key column.

If you need to retrieve the table name associated with a record (ex: for your Hive stage or something else), it should be available in an SDC record attribute named jdbc.tables.

Please refer to the docs for more detail on what's outlined above.

Thanks, Jeff, If I have the composite keys or the primary keys that have datatype as INT (other than string) i cannot run the multithreaded job?

Roh gravatar imageRoh ( 2017-10-10 17:28:44 -0500 )edit

That's correct. You will still be able to use the origin, and incrementally (i.e. tracking progress), but it will not support partitioning/multi-threaded processing within the table.

jeff gravatar imagejeff ( 2017-10-17 10:44:59 -0500 )edit
Asked: 2017-10-10 12:29:48 -0500

