Ask Your Question

Error in Hadoop FS Destination: "User: sdc is not allowed to impersonate hadoop"

asked 2017-05-15 21:15:56 -0500

metadaddy gravatar image

I'm trying to send some files from a local directory to HDFS the problem is that I'm getting the flowing error:

HADOOPFS_44 - Could not verify the base directory: 'org.apache.hadoop.ipc.RemoteException( User: sdc is not allowed to impersonate hadoop'

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2017-05-15 21:16:14 -0500

metadaddy gravatar image

updated 2017-05-19 09:42:38 -0500

LC gravatar image

If your Hadoop cluster is Kerberized you must have a Kerberos service principal for Data Collector, typically it should be sdc/<HOST> (where <HOST> is the hostname where Data Collector runs) and the Data Collector user name for Hadoop is sdc).

If your Hadoop cluster is not Kerberized, the Data Collector user name for Hadoop is the Unix user name that started the Data Collector. This could be sdc if you are running it as a service, or your own user name.

It looks like your Data Collector user name for Hadoop is sdc, so I'll use that in the remainder of this answer.

In the Hadoop FS destination, if you want to impersonate a different Hadoop user than the one running the data collector (user sdc), in the Hadoop FS tab, you should set the HDFS User to the desired user. This it is all you have to do in Data Collector.

Next, you'll have to configure the HDFS name node to allow the Data Collector user (user sdc), to be a proxy user for other users. You do that by setting the following properties in the hdfs-site.xml of your name node, or the corresponding safety valve if you're using Cloudera Manager:


Remember, this is assuming your Data Collector is using the Hadoop user name sdc.

Once you make those changes, you need to restart the name node.

If you are running a production setup make sure you configure the proxy user properties above in the most restrictive manner possible for your usage (instead of using *, that means ALL).

NOTE: If you leave the Hadoop FS destination Hadoop User configuration empty, then your pipeline will interact with HDFS as the Hadoop user running the the Data Collector (user sdc).

edit flag offensive delete link more


I note that when I do this (Streamsets V3.2.x.x) with Cloudera V5.9, the core-site.xml in the var/run/cloudera_scm_agent/currentHDFSConfig directory is updated. The core-site.xml in /var/lib/sdc/resources/hadoop-conf/ is not. Since this path is returned by the SDC_RESOURCES env var, it should be.

badcat914 gravatar imagebadcat914 ( 2018-06-19 12:02:49 -0500 )edit

answered 2018-01-11 15:54:29 -0500

rupal gravatar image

Please note that if you are using MapR, then follow MapR's documentation on enabling impersonation based on how your cluster is configured and whether it's a secure MapR cluster.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-05-15 21:15:56 -0500

Seen: 4,592 times

Last updated: Jan 11 '18