Limit number of resources consumed by a batch job

asked 2018-01-21 03:48:40 -0500

Hi All,

I developed a StreamSets batch job that loads file from Hadoop (in JSON format), do some transformations and finally writes the transformed fields into Hadoop (in AVRO format). I usually run jobs in Cluster YARN Streaming and in order to limit the resources (in terms of memory and cores) I play with the number of workers, that can be set directly on StreamSets.

However, if you run a job in Cluster Batch it's not possible to set the number of workers, and therefore I cannot limit the amount of resources consumed ... When I run it the amount of mem/cores used explodes...

How can I limit the amount of resources consumed?

Thanks, Alessandro

edit retag flag offensive close merge delete