Error uploading CSV data file (multipart/form-data) to Http Server Origin

asked 2020-04-27 19:36:20 -0500

ajviradia gravatar image

updated 2020-04-30 12:03:33 -0500

I have a very basic pipeline in docker contianer (streamsets/datacollector:3.15.0) with Http Server Origin listening at port 8086. I try to upload a csv file with "Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW". It seems like it is not supporting multipart form-data.

Here is the content of csv data:

id,first_name,last_name,email,gender,ip_address
1,Alexandrina,Poure,apoure0@wired.com,Female,112.145.236.171
2,Elvis,Kembley,ekembley1@oakley.com,Male,213.108.128.122
3,Brennan,Lindenstrauss,blindenstrauss2@google.de,Male,35.238.29.17
4,Vidovik,Haveline,vhaveline3@webnode.com,Male,252.233.133.207

Http Server is "Data Format" tab is configured as under:

  • unchecked "Allow Extra Columns".
  • Data Format : Delimited
  • Delimiter Format Type: Default CSV
  • Header Line : With Header Line
  • Rest all options are defaults

I am getting following error on the client when uploading the file:

HTTP/1.1 500 java.io.IOException: com.streamsets.pipeline.lib.parser.RecoverableDataParserException: DELIMITED_PARSER_01 - Unexpected number of columns at offset 143, contains 6 fields whereas only 1 are available in header

Connection: close Date: Tue, 28 Apr 2020 00:23:22 GMT Cache-Control: must-revalidate,no-cache,no-store Content-Type: text/html;charset=iso-8859-1 Content-Length: 719 Server: Jetty(9.4.12.v20180830)

<html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 500 java.io.IOException: com.streamsets.pipeline.lib.parser.RecoverableDataParserException: DELIMITED_PARSER_01 - Unexpected number of columns at offset 143, contains 6 fields whereas only 1 are available in header</title> </head> <body>

HTTP ERROR 500

Problem accessing /. Reason:

    java.io.IOException: com.streamsets.pipeline.lib.parser.RecoverableDataParserException: DELIMITED_PARSER_01 - Unexpected number of columns at offset 143, contains 6 fields whereas only 1 are available in header


Powered by Jetty:// 9.4.12.v20180830

</body> </html>

Can someone cofirm if multipart data is supported by HTTP Server Origin? If so, help identify what the issue might be? Thanks for any help.

edit retag flag offensive close merge delete

Comments

Just as a sanity check, can you paste this data into a file and try to read it with a directory origin instead? Configure the origin Data Format (delimited) in the same way. See if it's able to consume it correctly.

jeff gravatar imagejeff ( 2020-05-01 12:45:48 -0500 )edit

Jeff, in fact as a stop gap, I have this data in CSV (UTF-8) file and I am using Local FS origin for building the rest of the pipeline. Pipeline is working fine with the CSV file. P.S. I don't have enough karma points to load the files or images to add color to this issue :(

ajviradia gravatar imageajviradia ( 2020-05-04 11:18:35 -0500 )edit