Ask Your Question
0

kafka consumer origin fetching the same data in a loop

asked 2020-05-28 10:12:32 -0500

logan gravatar image

updated 2020-05-29 00:43:00 -0500

iamontheinet gravatar image

i am using streamsets version 3.1.0. my origin is Kafka consumer with stage library apache kafka 2.0.0 and my destination is sql database. for some of my topics the origin is fetching data in loops eg : lets say i have 50 records in my topic. The origin is fetching these 50 records again and again in a loop.

sorry the sdc version is 3.10.1 config details: 1) i have five broker and five zookeeper urls which i have specified in a comma separated format 2) consumer group : streamset datacollector 3)auto offset reset : earliest 4)Max Batch Size : 1000 5) Batch wait time : 2000 6) Rate Limit Per Partition : 1000 7)Data Format : Json

Note : i am facing this anomaly in only few of my pipelines not in all my pipelines

edit retag flag offensive close merge delete

Comments

Are you really using SDC version 3.1.0? Or, is that a typo? :) Can you also update your question with config details of your Kafka origin?

iamontheinet gravatar imageiamontheinet ( 2020-05-28 15:41:11 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
2

answered 2020-05-29 17:44:34 -0500

Mark Brooks gravatar image

Please use the Kafka Multitopic Consumer rather than the older Kafka Consumer, even if you are only consuming from a single topic, as the Kafka Multitopic Consumer uses the new Kafka client that stores offsets in Kafka rather than ZooKeeper. In the older Kafka Consumer a ZooKeeper timeout could cause the issue you saw.

I also recommend an upgrade to the newest version of SDC as there have been improvements to SDC's Kafka Multitopic Consumer.

The loop you are reporting is typically caused by a Kafka rebalance process, when the Broker believes the consumer has stopped responding. In that case, an SDC could be processing a set of messages but might not have commited the offset yet before Kafka rebalanced and allowed another SDC consumer thread to read the same messages. You should be able to see if a rebalance occured by looking in the Kafka server logs.

The main thing to avoid is pipelines that take too long to process messages. For example, if you have a pipeline that does a JDBC lookup that takes a long time, or has a slow JavaScript Evaluator, it is possible this appears to the Broker as a slow consumer. If you can't speed up the processing, try a smaller batch size so offsets are committed more frequently.

Also, make sure SDC does not become CPU bound as perhaps the Kafka heartbeat thread is getting starved.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-05-28 10:12:32 -0500

Seen: 46 times

Last updated: May 29