Tuning Hadoop FS origin (cluster mode)

asked 2018-10-16 08:51:54 -0600

Francisco gravatar image

updated 2018-10-16 08:54:40 -0600

Hello all, I have an Haddop FS origin (cluster mode) that read from HDFS origin multiple files (total of 11 Gb). As destination I have the trash just to verify the reading performance. The all process takes aprox: 8 min. Performing the same task with Apache pig takes aprox: 1 min 30 secs. I already try to increase the memory of the workers as well changing the number of workers. There are some kind of available tuning on the Hadoop FS origin configurations?

Thanks!

edit retag flag offensive close merge delete