How do i configure remote CDH server for loading data into HDFS via Hadoop FS

asked 2020-03-06 01:39:32 -0600

anonymous user


updated 2020-03-06 12:55:58 -0600

jeff gravatar image

I have StreamSets installed on a host A (Linux 6+ server), my Hadoop cluster is CDH 5.14 and running on a different server. I am able to authenticate into my Hadoop server B from A via Kerberos and a keytab file.

My requirement is to load data from RDBMS (oracle) into hdfs. I am able to read data from my oracle table where i have configured the jdbc jar for oracle and able to view data.

I have followed the tutorials and am trying to build the pipeline with the below components.

JDBC Query Consumer -->Hive Metadata --> Hadoop FS & Hive Metastore When it comes to the configuration of Hive Metadata, how do i make the config files available from Server B into my StreamSets Server A? Are the files core-site.xml, hdfs-site.xml, and hive-site.xml needed to be configured (as i do not have access to these files being part of the remote CDH Cluster).

Similarly for the Hadoop FS component as well how will i specify the config files? Are they mandatory?

edit retag flag offensive close merge delete