Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Remove specific XML section?

Hello, I am trying to bring XML files into a SDC pipeline, however, I am getting an error when I set the Directory origin processor Data Format to XML I get this error: SPOOLDIR_01 - Failed to process file '....xml' at position '0': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId '...xml', offset '0', maximum length '506384'

I've realized this occurs because the XML document contains one element that contains a lot of extra text that we don't need. Is there a way to remove a repeating xml section? I would want to remove all <xmleelementtoremove> sections in the document: <top> <secondlevel> <thirdlevel> <xmlelementtoremove>lots of text</xmlelementtoremove> </thirdlevel> <thirdlevel> <xmlelementtoremove>more lengthy text</xmlelementtoremove> </thirdlevel> </secondlevel> </top>

I can't even get it to parse the XML though to begin with, so I'm not sure how to manipulate it to remove those sections, unless there is way to do it from text before it gets converted to XML. Thanks!

Remove specific XML section?

Hello, I am trying to bring XML files into a SDC pipeline, however, I am getting an error when I set the Directory origin processor Data Format to XML I get this error: SPOOLDIR_01 - Failed to process file '....xml' at position '0': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId '...xml', offset '0', maximum length '506384'

I've realized this occurs because the XML document contains one element that contains a lot of extra text that we don't need. Is there a way to remove a repeating xml section? I would want to remove all <xmleelementtoremove> sections in the document: <top> <secondlevel> <thirdlevel> <xmlelementtoremove>lots document:

<Top>
    <SecondLevel>
        <ThirdLevel>
            <XMLElementToRemove>lots of text</xmlelementtoremove>
        </thirdlevel>
        <thirdlevel>
            <xmlelementtoremove>more text</XMLElementToRemove>
        </ThirdLevel>
        <ThirdLevel>
            <XMLElementToRemove>more lengthy text</xmlelementtoremove>
        </thirdlevel>
        </secondlevel>
</top>

text</XMLElementToRemove> </ThirdLevel> </SecondLevel> </Top>

I can't even get it to parse the XML though to begin with, so I'm not sure how to manipulate it to remove those sections, unless there is way to do it from text before it gets converted to XML. Thanks!

Remove specific XML section?

Hello, I am trying to bring XML files into a SDC pipeline, however, I am getting an error when I set the Directory origin processor Data Format to XML I get this error: SPOOLDIR_01 - Failed to process file '....xml' at position '0': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId '...xml', offset '0', maximum length '506384'

I've realized this occurs because the XML document contains one element that contains a lot of extra text that we don't need. Is there a way to remove a repeating xml section? I would want to remove all <xmleelementtoremove> sections in the document:

<Top>
    <SecondLevel>
        <ThirdLevel>
            <XMLElementToRemove>lots of text</XMLElementToRemove>
        </ThirdLevel>
        <ThirdLevel>
            <XMLElementToRemove>more lengthy text</XMLElementToRemove>
        </ThirdLevel>
     </SecondLevel>
</Top>

I can't even get it to parse the XML though to begin with, so I'm not sure how to manipulate it to remove those sections, unless there is way to do it from text before it gets converted to XML. Thanks!