Tuning Hadoop FS origin (cluster mode)

asked 2018-10-16 08:51:54 -0600

Francisco gravatar image

updated 2018-10-16 08:54:40 -0600

Hello all, I have an Haddop FS origin (cluster mode) that read from HDFS origin multiple files (total of 11 Gb). As destination I have the trash just to verify the reading performance. The all process takes aprox: 8 min. Performing the same task with Apache pig takes aprox: 1 min 30 secs. I already try to increase the memory of the workers as well changing the number of workers. There are some kind of available tuning on the Hadoop FS origin configurations?


