Tuning Hadoop FS origin (cluster mode)
Hello all, I have an Haddop FS origin (cluster mode) that read from HDFS origin multiple files (total of 11 Gb). As destination I have the trash just to verify the reading performance. The all process takes aprox: 8 min. Performing the same task with Apache pig takes aprox: 1 min 30 secs. I already try to increase the memory of the workers as well changing the number of workers. There are some kind of available tuning on the Hadoop FS origin configurations?
Thanks!