How to sync data from CSV file to Kafka Producer in Avro message with Confluent Schema Registry?

asked 2018-01-16 20:09:30 -0600

casel.chen gravatar image

updated 2018-01-18 18:33:09 -0600

metadaddy gravatar image

I want to read data from a csv file (total 100 lines) and send them to kafka producer in avro message with confluent schema registry, but it reported errors like "AVRO_GENERATOR_00 - Record 'zhima.csv::2255' is missing required avro field ''"., how to config the pipeline?

The csv file content looks like:

"1","Jack","IDENTITY_CARD","xxxxx32","","268800222596185591781808705","732","1","ZM201708113000000107000563167056","201708111830006580000000000025","3","8","Rose","2017-08-11 18:30:00","sys","2017-08-11 18:30:00","sys"

Here is the screenshot image description

Here is the avro schema I used:

{"namespace": "sample.zhima.avro",
  "type": "record",
  "name": "Zhima",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "name",  "type": "string"},
    {"name": "cert_type",  "type": "string"},
    {"name": "cert_no",  "type": "string"},
    {"name": "phone",  "type": ["null","string"], "default": null},
    {"name": "open_id",  "type": "string"},
    {"name": "credit_score",  "type": ["null","int"], "default": null},
    {"name": "status",  "type": "int"},
    {"name": "biz_no",  "type": "string"},
    {"name": "transactionid",  "type": "string"},
    {"name": "tenant_id",  "type": "string"},
    {"name": "operator_id",  "type": ["null","string"], "default": null},
    {"name": "operator_name",  "type": ["null","string"], "default": null},
    {"name": "created_at",  "type": "long"},
    {"name": "created_by",  "type": "string"},
    {"name": "updated_at",  "type": ["null","long"], "default": null},
    {"name": "updated_by",  "type": ["null","string"], "default": null}
what does line 2255 and 2256 of your CSV look like? Is it a blank line by chance?

tmcgrath gravatar imagetmcgrath ( 2018-01-17 07:40:23 -0600 )edit

I don't think the CSV file has problem because I can ingest it with my program before using streamsets dc. BTW, the CSV file I used has only 100 lines.

casel.chen gravatar imagecasel.chen ( 2018-01-18 18:09:51 -0600 )edit

answered 2018-01-18 18:32:25 -0600

metadaddy gravatar image

It looks like the problem is case sensitivity - ID is not the same as id. You'll need to change your schema to match your data, or rename the fields in the pipeline to match the schema.

