Ask Your Question
1

Drift Synchronization Solution for Hive on AWS

asked 2019-08-05 00:02:23 -0500

Satendra Tiwari gravatar image

updated 2019-08-05 00:02:55 -0500

Is it possible to build the same solution on AWS Ecosystem.

  1. Instead of HDFS (Hadoop FS or MapR FS destination), is it possible to use AWS S3?
  2. Is it possible to use AWS Glue Metastore instead of Hive Metastore?

Is it possible to customize the existing solution? For example: certain operators that could help me achieve the same outcome using more work. My limitation is we already have a functional ingestion system and I am doing POC with streamsets to see if we can improve our existing solution.

Reference: https://streamsets.com/documentation/...

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-08-13 19:51:08 -0500

metadaddy gravatar image

Looking at the Amazon EMR documentation, it says "The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, integrating with Amazon EMR as well as Amazon RDS, Amazon Redshift, Redshift Spectrum, Athena, and any application compatible with the Apache Hive metastore." [my emphasis].

I would start with a working Hadoop/Hive pipeline, replace the HDFS destination with an Amazon S3 destination, then point the Hive Metastore destination at the AWS Glue Hive endpoint, and see what happens.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-08-05 00:02:23 -0500

Seen: 97 times

Last updated: Aug 13