'Content is not allowed in prolog' error parsing XML
New to streamsets, so I apologize in advance if I am doing something goofy. All I want to do is parse an xml file with the following format
<?xml version="1.0" encoding="utf-8"?>
<ordata>
<row Id="2" Id2="1" Count="7" ... />
.
.
.
</ordata>
I've tried multiple combinations of directory reader, with the XML ata format, including xpath /ordata/row/ and row as the record delimiter, and nothing as record delimiter. Wondering if it's because all the fields are attributes, or that there's no explicit end tag. In preview all I get back is
Event Record1 (new-file): {MAP}
filepath: {STRING} "/STREAMSETS/so/source/Data.xml
The sdc log file contains the following error:
2017-11-06 17:29:32,251 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] INFO Pipeline - Processing lifecycle start event with stage
2017-11-06 17:29:32,254 [user:*admin] [pipeline:SO Input/SOInputfda19c9f-674f-4325-99b5-1e0533a68d4e] [runner:] [thread:preview-pool-1-thread-1] ERROR SpoolDirSource - Failed to process file '/STREAMSETS/SO/source/Data.xml' at position '-1': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
at com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.produce(SpoolDirSource.java:652)
at com.streamsets.pipeline.stage.origin.spooldir.SpoolDirSource.produce(SpoolDirSource.java:510)
at com.streamsets.pipeline.configurablestage.DSource.produce(DSource.java:38)
at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:228)
at com.streamsets.datacollector.runner.StageRuntime$2.call(StageRuntime.java:222)
at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:180)
at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:249)
at com.streamsets.datacollector.runner.StagePipe.process(StagePipe.java:231)
at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runPollSource(PreviewPipelineRunner.java:315)
at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.run(PreviewPipelineRunner.java:214)
at com.streamsets.datacollector.runner.Pipeline.run(Pipeline.java:510)
at com.streamsets.datacollector.runner.preview.PreviewPipeline.run(PreviewPipeline.java:51)
at com.streamsets.datacollector.execution.preview.sync.SyncPreviewer.start(SyncPreviewer.java:206)
at com.streamsets.datacollector.execution.preview.async.AsyncPreviewer.lambda$start$0(AsyncPreviewer.java:94)
at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:249)
at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:245)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_00 - Cannot advance reader 'Data.xml' to offset '0'
at com.streamsets.pipeline.lib.parser.xml.XmlDataParserFactory.createParser(XmlDataParserFactory.java:80)
at com.streamsets.pipeline.lib.parser.xml.XmlDataParserFactory.getParser(XmlDataParserFactory.java:60)
at com ...