Ask Your Question

Need help to fix user related issue to connect with Hadoop FS as destination

asked 2018-12-21 01:34:58 -0500

Arnab gravatar image

Hi. I am facing a issue while trying to establish a pipe between Kafka Consumer and Hadoop FS. I am getting user related issue at Hadoop FS end . The error message is as follow: HADOOPFS_59 - Recovery failed to rename old _tmp_ files: org.apache.hadoop.ipc.RemoteException( User: sdc is not allowed to impersonate hdfs

Can anyone help?

Configuration details at Hadoop FS: Hadoop FS tab: Hadoop FS URI: hdfs://hadoop-hadoop-hdfs-nn-0.hadoop-hadoop-hdfs-nn.datalake.svc.cluster.local:9000/ HDFS User: hdfs

Output File tab: File Type: Text files Files Prefix: sdc-${sdc:id()} Directory Template: /tmp/out/${YYYY()}-${MM()}-${DD()}-${hh()} Validate HDFS Permissions: not Ticked.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2019-04-03 07:52:56 -0500

swapnil2.sonawane gravatar image

If your Hadoop cluster is Kerberized you must have a Kerberos service principal for Data Collector, typically it should be sdc/<host> (where <host> is the hostname where Data Collector runs) and the Data Collector user name for Hadoop is sdc).

If your Hadoop cluster is not Kerberized, the Data Collector user name for Hadoop is the Unix user name that started the Data Collector. This could be sdc if you are running it as a service, or your own user name.

It looks like your Data Collector user name for Hadoop is sdc, so I'll use that in the remainder of this answer.

In the Hadoop FS destination, if you want to impersonate a different Hadoop user than the one running the data collector (user sdc), in the Hadoop FS tab, you should set the HDFS User to the desired user. This it is all you have to do in Data Collector.

Next, you'll have to configure the HDFS name node to allow the Data Collector user (user sdc), to be a proxy user for other users. You do that by setting the following properties in the hdfs-site.xml of your name node, or the corresponding safety valve if you're using Cloudera Manager:

<property> <name>hadoop.proxyuser.sdc.hosts</name> <value></value> </property> <property> <name>hadoop.proxyuser.sdc.groups</name> <value></value> </property> Remember, this is assuming your Data Collector is using the Hadoop user name sdc.

Once you make those changes, you need to restart the name node.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-12-21 01:34:58 -0500

Seen: 287 times

Last updated: Dec 21 '18