filter files from zip and re-zip desired files

asked 2019-11-01 07:38:49 -0600

Peter Delaney gravatar image

My goal for my pipeline is to read a Whole ZIP File from Directory Origin filter out certain file names in the ZIP and Re-ZIP the contents back into a new ZIP.

My Pipeline Directory Origin --> Custom Processor unzipping --> Filter Processor --> Re-Zip Contents

My Custom Processor unzips the contents and places it to a local directory on the filesystem. It also creates a record LIST of all the files names in the ZIP called listOfFiles.

I then Filter on the files that I am interested in. My next step is to Re-ZIP those files that I am interested in and placing those to S3 or local FS.

Questions:

Do I need to write a Custom Processor to Re-ZIP my identified files?

Is it possible to do this in 1 Pipeline or do I need to create another Pipeline whose Origin is the Un-ZIP directory?

I was hoping this Re-ZIP would be out of the box functionality from Streamsets, can this be achieved?

edit retag flag offensive close merge delete