Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Question about threads management in Streamsets

Hi all,

we are using streamsets to migrate data from a local fs to the azure data lake store. the pipeline is very simple, a directory origins followed by an azure data lake store destination.
About the origin, we've increased parallelism setting the number of threads to 10. As a consequence, the sdc executes 10 runners.
Monitoring the sdc from the SDC metrics portal, especially the Threads graph, we noticed that the number of live threads is constantly increasing. looking at the Threads Dump, the majority of them is a "Data Lake Idle Close Thread"; moreover it seems that the sdc keeps this thread alive even if the file on which the thread is listening to is closed.
is this behavior normal? i mean, is normal that sdc keeps this thread alive even if the file is closed? Moreover, is there a way to kill this useless threads?

I'm asking this because we've experienced an "OutOfMemoryException: unable to create new native thread" error and we did not find anything on the internet to solve this problem.

Thanks for the help.
Roberto

click to hide/show revision 2
None

Question about threads management in Streamsets

Hi all,

we are using streamsets to migrate data from a local fs to the azure data lake store. the pipeline is very simple, a directory origins followed by an azure data lake store destination.
About the origin, we've increased parallelism setting the number of threads to 10. As a consequence, the sdc executes 10 runners.
Monitoring the sdc from the SDC metrics portal, especially the Threads graph, we noticed that the number of live threads is constantly increasing. looking at the Threads Dump, the majority of them is a "Data Lake Idle Close Thread"; moreover it seems that the sdc keeps this thread alive even if the file on which the thread is listening to is closed.
is this behavior normal? i mean, is normal that sdc keeps this thread alive even if the file is closed? Moreover, is there a way to kill this useless threads?

I'm asking this because we've experienced an "OutOfMemoryException: unable to create new native thread" error and we did not find anything on the internet to solve this problem.

Thanks for the help.
Roberto

click to hide/show revision 3
None

Question about threads thread management in StreamsetsStreamSets Data Collector

Hi all,

we We are using streamsets StreamSets to migrate data from a local fs to the azure FS to the Azure data lake store. the The pipeline is very simple, a directory origins followed by an azure data lake store destination.

About the origin, we've increased parallelism setting the number of threads to 10. As a consequence, the sdc executes 10 runners.

Monitoring the sdc from the SDC metrics portal, especially the Threads graph, we noticed that the number of live threads is constantly increasing. looking at the Threads Dump, the majority of them is a "Data Lake Idle Close Thread"; moreover it seems that the sdc keeps this thread alive even if the file on which the thread is listening to is closed.

is this behavior normal? i mean, is normal that sdc keeps this thread alive even if the file is closed? Moreover, is there a way to kill this useless threads?

threads?

I'm asking this because we've experienced an "OutOfMemoryException: anOutOfMemoryException: unable to create new native thread" thread error and we did not find anything on the internet to solve this problem.

Thanks for the help.
Roberto
problem.