Ask Your Question

How to sync data from CSV file to Kafka Producer in Avro message with Confluent Schema Registry?

asked 2018-01-16 20:09:30 -0600

casel.chen gravatar image

updated 2018-01-18 18:33:09 -0600

metadaddy gravatar image

I want to read data from a csv file (total 100 lines) and send them to kafka producer in avro message with confluent schema registry, but it reported errors like "AVRO_GENERATOR_00 - Record 'zhima.csv::2255' is missing required avro field ''"., how to config the pipeline?

The csv file content looks like:

"1","Jack","IDENTITY_CARD","xxxxx32","","268800222596185591781808705","732","1","ZM201708113000000107000563167056","201708111830006580000000000025","3","8","Rose","2017-08-11 18:30:00","sys","2017-08-11 18:30:00","sys"

Here is the screenshot image description

Here is the avro schema I used:

{"namespace": "sample.zhima.avro",
  "type": "record",
  "name": "Zhima",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "name",  "type": "string"},
    {"name": "cert_type",  "type": "string"},
    {"name": "cert_no",  "type": "string"},
    {"name": "phone",  "type": ["null","string"], "default": null},
    {"name": "open_id",  "type": "string"},
    {"name": "credit_score",  "type": ["null","int"], "default": null},
    {"name": "status",  "type": "int"},
    {"name": "biz_no",  "type": "string"},
    {"name": "transactionid",  "type": "string"},
    {"name": "tenant_id",  "type": "string"},
    {"name": "operator_id",  "type": ["null","string"], "default": null},
    {"name": "operator_name",  "type": ["null","string"], "default": null},
    {"name": "created_at",  "type": "long"},
    {"name": "created_by",  "type": "string"},
    {"name": "updated_at",  "type": ["null","long"], "default": null},
    {"name": "updated_by",  "type": ["null","string"], "default": null}
edit retag flag offensive close merge delete


what does line 2255 and 2256 of your CSV look like? Is it a blank line by chance?

tmcgrath gravatar imagetmcgrath ( 2018-01-17 07:40:23 -0600 )edit

I don't think the CSV file has problem because I can ingest it with my program before using streamsets dc. BTW, the CSV file I used has only 100 lines.

casel.chen gravatar imagecasel.chen ( 2018-01-18 18:09:51 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted

answered 2018-01-18 18:32:25 -0600

metadaddy gravatar image

It looks like the problem is case sensitivity - ID is not the same as id. You'll need to change your schema to match your data, or rename the fields in the pipeline to match the schema.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2018-01-16 20:09:30 -0600

Seen: 42 times

Last updated: Jan 18