Streamsets - RAM not getting purged

asked 2019-10-29 03:57:36 -0600

abhinav gravatar image

updated 2019-10-29 04:05:21 -0600

Streamsets memory seems to be growing consistently on one Streamset RHEL deployment.

Every time streamsets runs, memory usage is increasing by about 1% of available RAM. I'm only using simple copy flows for large files, so its a worry that memory is not decreasing / purging even after some days.

After about 50 runs, memory is upto 70%. There is absolutely no de-allocation of memory resources happening.

While starting process streamset used 1.8 % of RAM After processing some files it was using 2.2% or RAM It increased to 3.4% after a third run of the same pipeline
After a 4thrun, it increased to 4.6% of RAM

After several runs over few days, memory increased to 70% of total RAM available. Restarting Streamsets brings memory utilization back to 1-2%.

This is a serious worry as we do not want to restart streamsets every few days.

Anyone else noticed this behavior. Can anyone provide an assistance? The java heap is set to 1024MB and GC is set to Mark and Sweep.

Thanks in advance.

edit retag flag offensive close merge delete


When you say RAM here, are you referring to the resident memory size? Note that it's normal for the JVM to continue to use up to its total heap allocation, even in a no-leak scenario. Are you getting OutOfMemoryErrors?

jeff gravatar imagejeff ( 2019-10-30 16:18:24 -0600 )edit

Yes resident memory size. We were getting some errors - we have removed the errors. Our heap size is 1024MB as configured in the environment variables, so based on your statememt, memory usage should not go beyond this figure.

abhinav gravatar imageabhinav ( 2019-10-31 06:56:48 -0600 )edit

That is not a correct expectation. The JVM itself always has some overhead, as well as any libraries (SDK or 3rd party) that may be using off-heap memory. See <- here for more information.

jeff gravatar imagejeff ( 2019-10-31 09:24:50 -0600 )edit