Schema generator produce errors on default values

asked 2018-01-09 08:22:07 -0500

atajti gravatar image

updated 2018-01-11 05:26:34 -0500

I tried using Avro format on local FS, with the schema stored in file header. Also, only boolean data came from __Dev Data origin__. The records went to error with the message AVRO_GENERATOR_03 - Record 'random:0:909' is missing required header 'avroSchema', which was then resolved by using the __Schema Generator processor__.

However, when using any default value in the __schema generator__ for booleans (1, true, TRUE, NULL, null, FALSE, false, 0) gave error HADOOPFS_13 - Error while writing to HDFS: org.apache.avro.AvroTypeException: Invalid default for field bool_field: false not a ["null","boolean"], and stopped the streaming. The only exception is true and TRUE, where the message was: false not a ["null","boolean"]

How can I make it work? Is it a bug?

Here is the stages of the exported json:


"stages" : [ {
      "instanceName" : "DevDataGenerator_01",
      "library" : "streamsets-datacollector-dev-lib",
      "stageName" : "com_streamsets_pipeline_stage_devtest_RandomDataGeneratorSource",
      "stageVersion" : "5",
      "configuration" : [ {
        "name" : "dataGenConfigs",
        "value" : [ {
          "type" : "BOOLEAN",
          "precision" : 10,
          "scale" : 2,
          "field" : "bool_field"
        } ]
      }, {
        "name" : "rootFieldType",
        "value" : "MAP"
      }, {
        "name" : "headerAttributes",
        "value" : [ ]
      }, {
        "name" : "delay",
        "value" : 1000
      }, {
        "name" : "batchSize",
        "value" : 1000
      }, {
        "name" : "numThreads",
        "value" : 1
      }, {
        "name" : "eventName",
        "value" : "generated-event"
      }, {
        "name" : "stageOnRecordError",
        "value" : "TO_ERROR"
      } ],
      "uiInfo" : {
        "description" : "",
        "label" : "Dev Data Generator 1",
        "xPos" : 37,
        "yPos" : 82,
        "stageType" : "SOURCE"
      },
      "inputLanes" : [ ],
      "outputLanes" : [ "DevDataGenerator_01OutputLane15156695630840" ],
      "eventLanes" : [ ],
      "services" : [ ]
    }, {
      "instanceName" : "SchemaGenerator_01",
      "library" : "streamsets-datacollector-basic-lib",
      "stageName" : "com_streamsets_pipeline_stage_processor_schemagen_SchemaGeneratorDProcessor",
      "stageVersion" : "1",
      "configuration" : [ {
        "name" : "config.schemaType",
        "value" : "AVRO"
      }, {
        "name" : "config.schemaName",
        "value" : "bool"
      }, {
        "name" : "config.attributeName",
        "value" : "avroSchema"
      }, {
        "name" : "config.avroNamespace",
        "value" : null
      }, {
        "name" : "config.avroDoc",
        "value" : null
      }, {
        "name" : "config.avroNullableFields",
        "value" : true
      }, {
        "name" : "config.avroDefaultNullable",
        "value" : true
      }, {
        "name" : "config.avroExpandTypes",
        "value" : true
      }, {
        "name" : "config.avroDefaultTypes",
        "value" : [ {
          "avroType" : "BOOLEAN",
          "defaultValue" : "NULL"
        } ]
      }, {
        "name" : "config.precisionAttribute",
        "value" : "precision"
      }, {
        "name" : "config.scaleAttribute",
        "value" : "scale"
      }, {
        "name" : "config.defaultPrecision",
        "value" : -1
      }, {
        "name" : "config.defaultScale",
        "value" : -1
      }, {
        "name" : "config.enableCache",
        "value" : false
      }, {
        "name" : "config.cacheSize",
        "value" : 50
      }, {
        "name" : "config.cacheKeyExpression",
        "value" : null
      }, {
        "name" : "stageOnRecordError",
        "value" : "TO_ERROR"
      }, {
        "name" : "stageRequiredFields",
        "value" : [ ]
      }, {
        "name" : "stageRecordPreconditions",
        "value" : [ ]
      } ],
      "uiInfo" : {
        "description" : "",
        "label" : "Schema Generator 1",
        "xPos" : 446,
        "yPos" : 160,
        "stageType" : "PROCESSOR"
      },
      "inputLanes" : [ "DevDataGenerator_01OutputLane15156695630840" ],
      "outputLanes" : [ "SchemaGenerator_01OutputLane15156695836560" ],
      "eventLanes" : [ ],
      "services" : [ ]
    }, {
      "instanceName" : "LocalFS_01",
      "library" : "streamsets-datacollector-basic-lib",
      "stageName" : "com_streamsets_pipeline_stage_destination_localfilesystem_LocalFileSystemDTarget",
      "stageVersion" : "3",
      "configuration" : [ {
        "name" : "configs.uniquePrefix",
        "value" : "sdc-${sdc:id()}"
      }, {
        "name" : "configs.fileNameSuffix",
        "value" : null
      }, {
        "name" : "configs.dirPathTemplateInHeader",
        "value" : false
      }, {
        "name" : "configs.dirPathTemplate",
        "value" : "/tmp/out/${YYYY()}-${MM()}-${DD()}-${hh()}"
      }, {
        "name" : "configs.timeZoneID",
        "value" : "UTC"
      }, {
        "name" : "configs.timeDriver",
        "value" : "${time:now()}"
      }, {
        "name" : "configs.maxRecordsPerFile",
        "value" : 0
      }, {
        "name" : "configs.maxFileSize",
        "value" : 0
      }, {
        "name" : "configs.idleTimeout",
        "value" : "${1 * HOURS}"
      }, {
        "name" : "configs.compression",
        "value" : "NONE"
      }, {
        "name" : "configs.otherCompression",
        "value" : null
      }, {
        "name" : "configs.fileType",
        "value" : "TEXT"
      }, {
        "name" : "configs.keyEl",
        "value" : "${uuid()}"
      }, {
        "name" : "configs.lateRecordsLimit",
        "value" : "${1 * HOURS}"
      }, {
        "name" : "configs.rollIfHeader",
        "value" : false
      }, {
        "name" : "configs.rollHeaderName",
        "value" : "roll"
      }, {
        "name" : "configs.lateRecordsAction",
        "value" : "SEND_TO_ERROR"
      }, {
        "name" : "configs.lateRecordsDirPathTemplate",
        "value" : "/tmp/late/${YYYY()}-${MM()}-${DD()}"
      }, {
        "name" : "configs.dataFormat",
        "value" : "AVRO"
      }, {
        "name" : "configs.hdfsPermissionCheck",
        "value" : true
      }, {
        "name" : "configs.permissionEL",
        "value" : null
      }, {
        "name" : "configs.skipOldTempFileRecovery",
        "value" : false
      }, {
        "name" : "configs.dataGeneratorFormatConfig.charset",
        "value" : "UTF-8"
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvFileFormat",
        "value" : "CSV"
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvHeader",
        "value" : "NO_HEADER"
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvReplaceNewLines",
        "value" : true
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvReplaceNewLinesString",
        "value" : " "
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvCustomDelimiter",
        "value" : "|"
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvCustomEscape",
        "value" : "\\"
      }, {
        "name" : "configs.dataGeneratorFormatConfig.csvCustomQuote",
        "value" : "\""
      }, {
        "name" : "configs.dataGeneratorFormatConfig.jsonMode",
        "value ...
(more)
edit retag flag offensive close merge delete

Comments

Can you edit your question and add a short schema that reproduces the problem?

metadaddy gravatar imagemetadaddy ( 2018-01-10 17:24:22 -0500 )edit