Ask Your Question

How to config hadoop fs origin in cluster mode

asked 2020-08-19 09:32:17 -0500

anonymous user


Hi there! currently, I'm using JDBC origin to read the hive data but very slow.

So would like to read data from Hadoop fs origin and run in cluster mode and increase the batch size to expect the faster result.

And I have issues with config the Hadoop fs origin can anyone please help me with sample job are what the required config to be added before reading data from it.

FYI I have data in hdfs file now in Parque now (it's almost 9000 gb) and other file in avro.


edit retag flag offensive close merge delete


i'm using right Hadoop config but still have this error in 'Hadoop FS Configuration Directory' Error HADOOPFS_29 - Hadoop configuration directory '/var/lib/sdc/resources/hadoop-conf' (resolved to '/var/lib/sdc/resources/hadoop-conf') is not inside SDC resources directory '/var/lib/sdc/resources'.

strem_dev gravatar imagestrem_dev ( 2020-08-20 15:59:30 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2020-08-20 22:06:30 -0500

iamontheinet gravatar image


It's expecting a relative URL so just hadoop-conf should work.

Cheers, Dash

edit flag offensive delete link more


Hi @iamontheinet I'm reading data from origin hadoop fs in cluster mode never configured cluster mode could you please guide me here what to be added? I can preview the data but if I start the job I get the error Unexpected error starting pipeline:

strem_dev gravatar imagestrem_dev ( 2020-08-21 11:13:57 -0500 )edit

' Error Unexpected error starting pipeline: java.lang.IllegalStateException: Timed out after waiting 121 seconds for cluster application to start. Submit command is not alive.' FYI we are using cloudera

strem_dev gravatar imagestrem_dev ( 2020-08-21 11:14:53 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2020-08-19 09:32:17 -0500

Seen: 151 times

Last updated: Aug 20 '20