Ask Your Question
1

Receiving java.util.NoSuchElementException with XML data from HTTP Client Origin

asked 2019-06-12 12:13:05 -0500

DataAnalyst1029 gravatar image

updated 2019-06-13 12:35:14 -0500

metadaddy gravatar image

I'm running a fresh SDC image on a Docker container, and am running into issues executing the pipeline.

I am attempting to pull an .xml file from a publicly accessible website into my pipeline, using an HTTP Client Origin. No authentication is needed, simply navigating to the https://www.testsite.xml URL retrieves the document. When using validation mode, I am able to see the 10 or so records coming from the site, however when I attempt to run the pipeline, I see the following errors:

com.streamsets.pipeline.lib.parser.DataParserException: DATA_PARSER_02 - Parser error: 'java.util.NoSuchElementException'
    at com.streamsets.pipeline.lib.parser.WrapperDataParserFactory$WrapperDataParser.normalizeException(WrapperDataParserFactory.java:147)
    at com.streamsets.pipeline.lib.parser.WrapperDataParserFactory$WrapperDataParser.parse(WrapperDataParserFactory.java:107)
    at com.streamsets.pipeline.stage.origin.http.HttpClientSource.parseResponse(HttpClientSource.java:616)
    at com.streamsets.pipeline.stage.origin.http.HttpClientSource.produce(HttpClientSource.java:303)
    at com.streamsets.pipeline.api.base.configurablestage.DSource.produce(DSource.java:38)
    at com.streamsets.datacollector.runner.StageRuntime.lambda$execute$2(StageRuntime.java:295)
    at com.streamsets.pipeline.api.impl.CreateByRef.call(CreateByRef.java:40)
    at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:243)
    at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:310)
    at com.streamsets.datacollector.runner.StagePipe.process(StagePipe.java:219)
    at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.processPipe(ProductionPipelineRunner.java:817)
    at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.runPollSource(ProductionPipelineRunner.java:561)
    at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunner.run(ProductionPipelineRunner.java:385)
    at com.streamsets.datacollector.runner.Pipeline.run(Pipeline.java:529)
    at com.streamsets.datacollector.execution.runner.common.ProductionPipeline.run(ProductionPipeline.java:110)
    at com.streamsets.datacollector.execution.runner.common.ProductionPipelineRunnable.run(ProductionPipelineRunnable.java:75)
    at com.streamsets.datacollector.execution.runner.standalone.StandaloneRunner.start(StandaloneRunner.java:701)
    at com.streamsets.datacollector.execution.runner.common.AsyncRunner.lambda$start$3(AsyncRunner.java:151)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:226)
    at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:222)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:226)
    at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:222)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at com.streamsets.datacollector.metrics.MetricSafeScheduledExecutorService$MetricsTask.run(MetricSafeScheduledExecutorService.java:100)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException
    at java.util.LinkedList.removeFirst(LinkedList.java:270)
    at com.streamsets.pipeline.lib.xml.StreamingXmlParser.read(StreamingXmlParser.java:195)
    at com.streamsets.pipeline.lib.xml.OverrunStreamingXmlParser.read(OverrunStreamingXmlParser.java:82)
    at com.streamsets.pipeline.lib.parser.xml.XmlCharDataParser.parse(XmlCharDataParser.java:132)
    at com.streamsets.pipeline.lib ...
(more)
edit retag flag offensive close merge delete

Comments

Could you edit your question to include the full stack trace from sdc.log?

metadaddy gravatar imagemetadaddy ( 2019-06-12 17:09:23 -0500 )edit

I have added the full stack trace, I apologize if that is not what was asked for. I did a little more research, and believe my issue may also stem from the source website being HTTPS, and not including the SSL/TLS certificate information in the origin stage.

DataAnalyst1029 gravatar imageDataAnalyst1029 ( 2019-06-13 11:24:17 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2019-06-13 11:30:08 -0500

metadaddy gravatar image

It looks like the first 10 records are fine, but the pipeline encounters an issue further into the data. Looking at the source, there seems to be an end element without the corresponding start element.

I would use curl to retrieve the XML from the URL, and inspect it to see if there is an obvious problem - maybe use xmllint to validate it?

edit flag offensive delete link more

Comments

1

I cannot thank you enough for your response. I was debugging the wrong issue. It looks like you were right - the pipeline WAS processing data, and failing on a specific record. My steps to fix: 1. Switched On Record Error to 'Stop Pipeline' 2. Increased Max Record Size (char) from 5000 to 25000

DataAnalyst1029 gravatar imageDataAnalyst1029 ( 2019-06-13 12:00:28 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-06-12 12:12:11 -0500

Seen: 126 times

Last updated: Jun 13