Ask Your Question

Unzip the File from Sftp to HDFS and store it is in a separate file.

asked 2017-12-10 23:46:50 -0500

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I have a zip file containing xml files in SFTP. I want to pick the file up from SFTP unzip it and save the extracted xmls to hdfs. I am unable to store it an individual file all the files are merged into a single file in the hdfs. Can someone please guide if there is any solution or workaround to this problem.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2019-12-23 01:35:49 -0500

Ranjith gravatar image

updated 2019-12-23 01:36:57 -0500

Hi, You can try using the pipeline with SFTP origin(with Whole file data format) -> HDFS dest (with Whole file data format and File Name Expression as ${record:value('/fileInfo/filename')} ) and then enable event on the dest and use a shell script. In a shell script, environment tab, add an environment variable FILE_NAME: ${record:value('/targetFileInfo/path')} and script tab add a unzip command (ex: hdfs dfs -cat ${FILE_NAME} | gzip -d | hdfs dfs -put - /tmp/ ).

edit flag offensive delete link more
Login/Signup to Answer

Question Tools



Asked: 2017-12-10 23:46:50 -0500

Seen: 1,163 times

Last updated: Dec 23 '19