Kafka Consumer (Cluster Mode)

asked 2017-12-12 02:59:06 -0600

Vivian Yang gravatar image

updated 2017-12-12 23:09:04 -0600

We are using streamsets pipeline by adding a Kafka Consumer as origin. The execution mode has been configure as Cluster Yarn Streaming and Kafka Consumer is using CDH 5.10.0 Kafka Cluster 2.1.0 Lib as the Stage Library. A kafka topic has been create with 7 partitions and 3 replicates.

Here are the issue we when start the pipeline:

  1. Pipiline's Kafka Consumer able to consume message at the beginning.
  2. After leaving the pipeline idle for some times (5 minutes), the kafka consumer not able to consume any new message.
  3. With Kafka-consumer-console, it show all the new message can be consume.

We checked the logs and found there is no errors returned. Just simply kafka is not consuming any of the data.

How we can troubleshoot on this issues?

edit retag flag offensive close merge delete

Comments

Did you check the batch wait time(ms) and max batch size is properly configured according to the data flow?

Roh gravatar imageRoh ( 2017-12-12 09:18:23 -0600 )edit

Ya... the value of batch wait time (ms) is 20000 and the max batch size is 10. We tried this in standalone mode and have no issues for this. Do you have any idea where we can debug further?

Vivian Yang gravatar imageVivian Yang ( 2017-12-12 22:44:59 -0600 )edit