Ask Your Question

Limit number of resources consumed by a batch job

asked 2018-01-21 03:48:40 -0600

Alessandro Negrini gravatar image

Hi All,

I developed a StreamSets batch job that loads file from Hadoop (in JSON format), do some transformations and finally writes the transformed fields into Hadoop (in AVRO format). I usually run jobs in Cluster YARN Streaming and in order to limit the resources (in terms of memory and cores) I play with the number of workers, that can be set directly on StreamSets.

However, if you run a job in Cluster Batch it's not possible to set the number of workers, and therefore I cannot limit the amount of resources consumed ... When I run it the amount of mem/cores used explodes...

How can I limit the amount of resources consumed?

Thanks, Alessandro

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-09-30 21:12:19 -0600

rupal gravatar image

Cluster Batch spawns a Map Reduce job. To limit resources for a Map Reduce job depends on the job scheduler on your distribution. Here's Cloudera's YARN tuning guide.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-01-21 03:48:21 -0600

Seen: 293 times

Last updated: Sep 30 '18