HDFS Standalone origin failing with invalid char between encapsulated token and pipe

HDFS standalone origin is failing with error invalid char between encapsulated token and pipe. Here are more details:

Source - text file with pipe delimited


  • Delimiter Character |
  • Escape Character \
  • Quote Character "

Test data -

1|Test data line 1
2|"Test" data line 2.

FIle is coming exactly as above, please help me how I can process this file.

What else do you have in the pipeline -- processor(s), destination? And when do you get/see the error?

That is not valid delimited data:

1|Test data line 1
2|"Test" data line 2

If you quote a field, the quotes have to enclose the entire field, like this:

1|Test data line 1
2|"Test data line 2"

If you want to read in your data regardless, you can set the quote character to some value that doesn't appear in the input data, for example, \u0000. If you do this, that data will be read in, and the quote characters will appear in the field value in Data Collector, i.e. "Test" data line 2.

You should only use quotes if the data includes the delimiter character, or a newline - e.g.

1|"Test data | line 1"
2|"Test data
line 2"
Thank you! This worked!!!!!

