Ask Your Question
1

How to specify table name with Drift Synchronization

asked 2017-12-20 19:13:56 -0600

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

My requirement is to read the data from a Kafka topic and ingest into HDFS and Hive. I'm using Hive Drift Synchronization technique as per the SDC guidelines but there is no case custody with this requirement.

My pipeline looks like this.

Kafka Consumer Origin -> Hive Metadata Processor -> Hive Metastore/HDFS -> MapReduce

At Metadata processor stage I need to enter table_name but my input dataset would not give me the table name. This table name is needed in the further pipeline to let HDFS destination to store the files. I was wondering if I can use expression evaluator right after origin to create record header to initiate the table name.

Second Question its erroring while connecting to hive database that no privileges to the stream set user "SDC"; my cluster admin says need to find a way to let SDC delegate my user ID: HIVE_20 - Error executing SQL: DESCRIBE DATABASE `fedidentity_qa"

Error I'm getting

log report .. HIVE_23 - TBL Properties 'com.streamsets.pipeline.stage.lib.hive.exceptions.HiveStageCheckedException: HIVE_20 - Error executing SQL: DESCRIBE DATABASE fedidentity_qa, Reason:Error while compiling statement: FAILED: SemanticException No valid privileges User sdc does not have privile

since I'm using Kafka consumer as origin the default consumer group "streamsetsDataCollector" takes the default stream set user "sdc" while executing the pipeline. not sure how to supply specific user credentials to execute the pipeline, please advise, thank you

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
1

answered 2017-12-21 11:16:13 -0600

metadaddy gravatar image

If the table name does not vary from record to record, you can just specify it directly in the Metadata Processor config, without using an expression:

image description

edit flag offensive delete link more

Comments

Thank you MD(metadaddy), it worked out . but i ran into an issue , its erroring while conencting to hive database that no privileges to the streaset user "SDC"; my cluster admin syas need to find a way to let SDC delegatemy user ID: HIVE_20 - Error executing SQL: DESCRIBE DATABASE `fedidentity_qa"

Koraganti77 gravatar imageKoraganti77 ( 2017-12-21 13:22:44 -0600 )edit

log report .. HIVE_23 - TBL Properties 'com.streamsets.pipeline.stage.lib.hive.exceptions.HiveStageCheckedException: HIVE_20 - Error executing SQL: DESCRIBE DATABASE `fedidentity_qa`, Reason:Error while compiling statement: FAILED: SemanticException No valid privileges User sdc does not have privile

Koraganti77 gravatar imageKoraganti77 ( 2017-12-21 13:25:56 -0600 )edit

since im using kafka consumer as origin the default consumer grooup "streamsetsDataCollector" takes the default streamset user "sdc" while executing the pipeline. not sue how to supply specific user credentials to execute the pipeline, pease advise , thank you

Koraganti77 gravatar imageKoraganti77 ( 2017-12-21 13:29:10 -0600 )edit

@Koraganti77 I've added your comments to the question as temporary, and please see my answer below for that. I would suggest you post another question and remove the second part of the question that I added. That will help you to get answers from some other people and it helps the community as well

Roh gravatar imageRoh ( 2017-12-22 12:41:57 -0600 )edit
0

answered 2017-12-22 11:05:45 -0600

Roh gravatar image

updated 2017-12-22 12:35:17 -0600

You can authorize with the keytabs, Also check if you added the configuration XML files.

Kerberos authentication document

Refer the Hive Streaming and check if you are missing something

Hadoop Fs destination properties and configuration files documentation

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

2 followers

Stats

Asked: 2017-12-20 19:13:56 -0600

Seen: 114 times

Last updated: Dec 22 '17