connect to hdfs with kerberos and TLS

asked 2019-07-21 08:34:10 -0600

anonymous user


updated 2019-07-24 01:06:27 -0600

Streamsets is installed using parcels on a CDH 5.8.2 cluster (not enough points to attach a screenshot) I intend to use it as standalone at this stage. the 5.8.2 cluster has no Kerberos or TLS.

My goal is to collect events from Kafka (Origin) which is also installed on the 5.8.2 cluster, and put it into another HDFS Cluster (cdh 5.12.2) with kerberos and tls.

On the streamsests instance: 1. core-site.xml, hdfs-site.xml and also hdfs-site-refreshable.xml were copied from 5.12.2 Cloudera manager to /var/lib/sdc/resources/hadoop-conf/ (files are symlinked to another directory) 2. keytab file path, principal and kerberos.client.enabled=true were changed in the properties file: /opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR-3.6.0/etc/

  • using klist I know that the keytab and principal are fine - there is an active ticket

Pipeline configuration: 1. HDFS User contains the principal 2. Kerberos Authentication is checked 3. Hadoop FS Configuration Directory is set to the directory which contains the configuration files (core,hdfs-site...). it's the same directory that the sdc resource directory files are symlinked to

Yet, I get this error while trying to validate: HADOOPFS_01 - Validation Error: Failed to configure or connect to the 'hdfs://nameservice1' Hadoop file system: Provided Subject must contain a KerberosPrincipal

Comment 1: both 5.8.2 and 5.12.2 are with "nameservice1" Comment 2: while validating I don't see any traffic from the sdc to the 5.12.2 cluster in the FW.

Q1: would TLS affect writing to HDFS? Q2: which ports should be open between the clusters

I will appreciate your assistance to overcome this HADOOPFS_01 error.

edit retag flag offensive close merge delete