Avro to Parquet conversion

asked 2019-11-01 09:34:32 -0600

Gangi gravatar image

updated 2019-11-08 13:10:21 -0600

I am working on creating a pipeline with hive drift solution, with snappy compression and i verified the hive table details but table is creating with out Snappy compression and also hdfs file is created with /.avro/sdc-fa8c663d-b55a-11e9-b76d-d31fde076fc7_5150dff1-eada-499d-b226-49170a3ef6c7. Please help me with below.

  1. Why is table is not creating with parquet.compression = SNAPPY

  2. when i create hive table directly in hive with parquet it is storing the respective data in hdfs file with /hivetabledir/ 000000_0 where as through streamsets it is like /.avro/sdc-fa8c663d-b55a-11e9-b76d-d31fde076fc7_5150dff1-eada-499d-b226-49170a3ef6c7. why is hdfs file is creating with two different ways?

C:\fakepath\General.PNG

C:\fakepath\connection.PNG C:\fakepath\outputfiles.PNG C:\fakepath\laterecords.PNG C:\fakepath\dataformat.PNG C:\fakepath\complete_pipeline.PNG C:\fakepath\MapReduce_General_Config.PNG C:\fakepath\MapReduce_Config.PNG C:\fakepath\Job_Config.PNG C:\fakepath\AvroConversion_Config.PNG C:\fakepath\Avrotoparquet_config.PNGC:\fakepath\logs.PNG

edit retag flag offensive close merge delete

Comments

Can you post a screenshot of the Hadoop FS destination config?

metadaddy gravatar imagemetadaddy ( 2019-11-01 10:20:26 -0600 )edit

Please find the requested details as attachment.

Gangi gravatar imageGangi ( 2019-11-01 10:34:47 -0600 )edit

I don't see an attachment

metadaddy gravatar imagemetadaddy ( 2019-11-01 10:39:57 -0600 )edit

Please let me know if you see them now

Gangi gravatar imageGangi ( 2019-11-01 10:48:57 -0600 )edit

I see the uploads - I edited the question to make them visible as images.

metadaddy gravatar imagemetadaddy ( 2019-11-01 10:52:19 -0600 )edit