Ask Your Question
1

MongoDB to Hive error

asked 2018-06-13 03:48:11 -0500

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I'm trying to read MongoDB data to Hive in realtime, and I get this error when I use MongoDB oplog mode:

com.streamsets.pipeline.api.base.OnRecordErrorException: HIVE_19 - Unsupported Type: MAP
    at com.streamsets.pipeline.stage.processor.hive.HiveMetadataProcessor.process(HiveMetadataProcessor.java:595)
    at com.streamsets.pipeline.api.base.RecordProcessor.process(RecordProcessor.java:52)
    at com.streamsets.pipeline.api.base.configurablestage.DProcessor.process(DProcessor.java:35)
    at com.streamsets.datacollector.runner.StageRuntime.lambda$execute$2(StageRuntime.java:245)
    at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:195)
    at com.streamsets.datacollector.runner.StageRuntime.execute(StageRuntime.java:257)
    at com.streamsets.datacollector.runner.StagePipe.process(StagePipe.java:219)
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.lambda$runSourceLessBatch$0(PreviewPipelineRunner.java:337)
    at com.streamsets.datacollector.runner.PipeRunner.executeBatch(PipeRunner.java:136)
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runSourceLessBatch(PreviewPipelineRunner.java:333)
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runPollSource(PreviewPipelineRunner.java:315)
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.run(PreviewPipelineRunner.java:209)
    at com.streamsets.datacollector.runner.Pipeline.run(Pipeline.java:522)
    at com.streamsets.datacollector.runner.preview.PreviewPipeline.run(PreviewPipeline.java:51)
    at com.streamsets.datacollector.execution.preview.sync.SyncPreviewer.start(SyncPreviewer.java:214)
    at com.streamsets.datacollector.execution.preview.async.AsyncPreviewer.lambda$start$0(AsyncPreviewer.java:94)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:226)
    at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:222)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(SafeScheduledExecutorService.java:226)
    at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:33)
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.call(SafeScheduledExecutorService.java:222)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at com.streamsets.datacollector.metrics.MetricSafeScheduledExecutorService$MetricsTask.run(MetricSafeScheduledExecutorService.java:100)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-06-13 20:55:00 -0500

metadaddy gravatar image

updated 2018-06-14 09:05:01 -0500

It looks like you need to flatten your input record to be able to write it to Hive. Use preview, or write the data as JSON to the Local FS destination to see how it's arriving from the MongoDB oplog origin, then use Field Flattener and the other processors to get it into the shape you need.

This article might be helpful: Transform Data in StreamSets Data Collector

See also the tutorial and video: Ingesting Drifting Data into Hive and Impala

edit flag offensive delete link more

Comments

Thank you you help!And i try to use mongodb not mongodb oplog mode to do this and refer to "Drift Synchronization Solution for Hive", i can see the input and output data in "hive metadata" assembly,but can not see the input data in "hadoop fs" and "hive metasotre",i don't know what's wrong.

supersujj gravatar imagesupersujj ( 2018-06-13 23:09:30 -0500 )edit

do you have some example?

supersujj gravatar imagesupersujj ( 2018-06-13 23:10:20 -0500 )edit

I added another useful link to the answer. You could export your pipeline to JSON, remove any passwords, and post it in a question to the Google Group at https://groups.google.com/a/streamsets.com/d/forum/sdc-user

metadaddy gravatar imagemetadaddy ( 2018-06-14 09:06:18 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-06-13 03:48:11 -0500

Seen: 17 times

Last updated: Jun 14