Kafka consumer performing poorly when decoding avro files.

asked 2018-01-16 11:23:11 -0500

aseemdomaini gravatar image

I have a setup where I read avro files from Kafka. The kafka consumer origin is set to decode the avro files and the schema is hosted on a confluent registry.

If I run a single pipeline where the destination from the kafka consumer is "Trash" , i am able to process around 2000 records per second. By duplicating the pipeline i am able to process at most 3000 records per second. The performance maxes out after 3-4 pipeline and adding more does not increases the processing rate. (hardware specs are good, its a 64 core server with LOT of ram and kafka performance is also not an issue as without decoding, a single kafka consumer origin is processing more than 10,000 records where the performance also scales with additional pipelines)

From initial investigation, it seems that threads are getting blocked at Jackson's "SerializerCache .getReadonlylookupmap". (I can not take the actual dump, the server is not connected to outside network and there is no way to copy something apart from pen and paper, i am working on setting up a smaller test setup).

edit retag flag offensive close merge delete