Streamsets DC Pipeline S3 origin stage error "Data read has a different length than the expected"

asked 2020-10-07 20:00:43 -0500

abdulsyed gravatar image

My pipeline is about polling the S3 directory to read json data and convert that into Avro and finally push to Kinesis. A flow will be as follows:

S3(origin) --> Convert from Map(process) --> Schema Generator(process) --> Kinesis(Destination with Data format Avro)

I'm seeing stage errors in S3 origin with the following exception "*Data read has a different length than the expected. Actual File size is less than expected *" due to which the data is getting moved to the errored folder in origin. Can someone help me why I might be seeing this exception? I am out of ideas currently why this would happen.

Configuraion: - Streamsets DC 3.16 on a docker running in Ec2 r5.xlarge - Files in S3 origin (Multi record JSON)

Any help would be appreciated.

edit retag flag offensive close merge delete