Extract files from zipped folder in StreamSets

asked 2017-09-29

prachi

I have an SFTP FTP Client as the origin of my pipeline. The SFTP server contains csv files inside zipped folders. What configuration (Data Format etc.) should I specify to fetch data from the csv files?

Any solutions to this ? Facing a similar problem.

ruchir.nishkam ( 2017-10-02 )

2 Answers

answered 2019-04-26

jeff

updated 2019-04-26 11:03:14 -0500

For .zip files, use Compression Format = Archive

Hi I was trying to add the archive option but it seems it does not work. When the file was transferred as a text it was working, but with this option does not. Any clues? Thank you

Anastasia ( 2020-04-03 )

answered 2017-10-02

metadaddy

updated 2017-10-04 14:35:12 -0500

Delimited data format with the Compressed Archive compression format will handle .tar.gz files:

image description

Can you use .tar.gz rather than .zip?

I tried this configuration, but it's not working. I tried checking the Process Subdirectories checkbox in SFTP/FTP tab shown above but in vain. Do I need to make any other changes apart from the ones shown in the figure?

prachi ( 2017-10-03 )

Getting the following error: HTTP_00 - Cannot parse record: org.apache.commons.compress.compressors.CompressorException: No Compressor found for the stream signature. Am I missing a config change ?

ruchir.nishkam ( 2017-10-03 )

Try the different compression formats - it may be Compressed File or Archive.

metadaddy ( 2017-10-03 )

Tried the different combinations, and the actual name within the 'File Name Pattern'. Same error persists.

ruchir.nishkam ( 2017-10-03 )

Can you create a similarly formatted file with dummy data and share it? I can give it a try and see what the problem is.

metadaddy ( 2017-10-03 )
