Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

How do you encode a Kafka Message Key as avro?

I switched Kafka Producer setting for Message Key Format to Avro on the Data Format tab. I need to utilize an Avro key as I cannot set the Key to null (entered as a separate question here. If a key is present, the Confluent REST API and the Confluent Python library expect the key to be Avro format when the value is Avro format.

There is an option to Kafka tab for Message Key Format to Confluent, in order to embed the schema subject ID, which signals that this stage integrates with Schema Registry and should embed the proper schema id when serializing the key as Avro. However, there is NO setting to provide the Subject or Schema ID, like you can with the value's Avro schema lookup. How does it get the schema ID then? This seems an oversight on Streamsets' part.

When I switch to Avro format, the stage embeds a default function sequence in the Message Key field on the Kafka tab: ${avro:decode(record:attribute('avroKeySchema'),base64:decodeBytes(record:attribute('kafkaMessageKey')))}

Avro functions (avro:*) are not documented at all in Functions docs. The Kafka Producer stage docs don't cover this at all, nor does the Avro Data Format doc page, which is solely focused on value encoding not key encoding.

Is this function sequence a work around for the fact I cannot set schema registry subject or schema id?

How does this mechanism work? Does it actually integrate with the Schema Registry?

It would be so much easier if Streamsets had native settings in the Kafka Producer stage handling this.