Ask Your Question
0

How to config hadoop fs origin in cluster mode

asked 2020-08-19 09:32:17 -0600

anonymous user

Anonymous

Hi there! currently, I'm using JDBC origin to read the hive data but very slow.

So would like to read data from Hadoop fs origin and run in cluster mode and increase the batch size to expect the faster result.

And I have issues with config the Hadoop fs origin can anyone please help me with sample job are what the required config to be added before reading data from it.

FYI I have data in hdfs file now in Parque now (it's almost 9000 gb) and other file in avro.

Thanks

edit retag flag offensive close merge delete

Comments

i'm using right Hadoop config but still have this error in 'Hadoop FS Configuration Directory' Error HADOOPFS_29 - Hadoop configuration directory '/var/lib/sdc/resources/hadoop-conf' (resolved to '/var/lib/sdc/resources/hadoop-conf') is not inside SDC resources directory '/var/lib/sdc/resources'.

strem_dev gravatar imagestrem_dev ( 2020-08-20 15:59:30 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-08-20 22:06:30 -0600

iamontheinet gravatar image

Hi!

It's expecting a relative URL so just hadoop-conf should work.

Cheers, Dash

edit flag offensive delete link more

Comments

Hi @iamontheinet I'm reading data from origin hadoop fs in cluster mode never configured cluster mode could you please guide me here what to be added? I can preview the data but if I start the job I get the error Unexpected error starting pipeline:

strem_dev gravatar imagestrem_dev ( 2020-08-21 11:13:57 -0600 )edit

' Error Unexpected error starting pipeline: java.lang.IllegalStateException: Timed out after waiting 121 seconds for cluster application to start. Submit command is not alive.' FYI we are using cloudera

strem_dev gravatar imagestrem_dev ( 2020-08-21 11:14:53 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-08-19 09:32:17 -0600

Seen: 105 times

Last updated: Aug 20