Kafka Consumer (Cluster Mode)

asked 2017-12-12 02:59:06 -0500

Vivian Y gravatar image

updated 2017-12-12 23:09:04 -0500

We are using streamsets pipeline by adding a Kafka Consumer as origin. The execution mode has been configure as Cluster Yarn Streaming and Kafka Consumer is using CDH 5.10.0 Kafka Cluster 2.1.0 Lib as the Stage Library. A kafka topic has been create with 7 partitions and 3 replicates.

Here are the issue we when start the pipeline:

  1. Pipiline's Kafka Consumer able to consume message at the beginning.
  2. After leaving the pipeline idle for some times (5 minutes), the kafka consumer not able to consume any new message.
  3. With Kafka-consumer-console, it show all the new message can be consume.

We checked the logs and found there is no errors returned. Just simply kafka is not consuming any of the data.

How we can troubleshoot on this issues?

edit retag flag offensive close merge delete


Did you check the batch wait time(ms) and max batch size is properly configured according to the data flow?

Roh gravatar imageRoh ( 2017-12-12 09:18:23 -0500 )edit

Ya... the value of batch wait time (ms) is 20000 and the max batch size is 10. We tried this in standalone mode and have no issues for this. Do you have any idea where we can debug further?

Vivian Y gravatar imageVivian Y ( 2017-12-12 22:44:59 -0500 )edit

How about spark.dynamicAllocation.executorIdleTimeout? When an executor has been idle for more than this duration(default is 60s), the executor will be removed. I had the same issue and resolved by this configuration.

junko_urata gravatar imagejunko_urata ( 2018-05-22 15:59:09 -0500 )edit