Ask Your Question
1

Unable to parse XML which has 'ampersand' in the data

asked 2019-10-07 06:08:45 -0500

Kumar gravatar image

updated 2019-10-09 10:40:11 -0500

metadaddy gravatar image

I am trying to parse an XML file using the "XML parser" processor. It's working fine in most of the cases. But in one case the data in my file is like <tag>&</tag> and <tag>&abc</tag>.

In these cases the XML parser is unable to parse the XML file. Could you please anyone help me out.

Thanks in advance.

edit retag flag offensive close merge delete

Comments

Please paste the full stack trace you are seeing. Or at least provide more details about what is happening.

jeff gravatar imagejeff ( 2019-10-07 13:14:12 -0500 )edit

XMLP_01 - Cannot XML parse the field '/text' for record '78378217 (1).xml::0': com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_03 - Can't parse XML: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,2081] Message: XML document structures must start and end within the

Kumar gravatar imageKumar ( 2019-10-09 06:58:44 -0500 )edit

Can you capture this particular record? Either by sending error records to file or looking in the input to find it? Then run it through a tool like xmllint. It appears to be malformed.

jeff gravatar imagejeff ( 2019-10-09 08:45:53 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-10-09 10:38:09 -0500

metadaddy gravatar image

The simple answer is that <tag>&</tag> and <tag>&abc</tag> are not legal XML. In XML, the & (ampersand) character denotes an entity reference, and, according to section 2.4 of the XML standard

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively.

You will need to fix the producing application to emit &amp; instead of a raw &, or pre-process the data to do the conversion.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-10-07 06:08:45 -0500

Seen: 375 times

Last updated: Oct 09