Ask Your Question
0

Reading a file inside a filename in streamsets transformer

asked 2020-04-12 05:56:02 -0500

anonymous user

Anonymous

Hi Team,

I have a listed the unprocessed files inside a file. I'm trying to read the unprocessed file to get the list of filename. From the filename ist, I wanted to read those those files inside streamsets transform. Please let me know how can I read it.

For eg., I have a file a.txt which have the filenames b.txt and c.txt. I want to read the content of the files b.txt and c.txt. These filenames will be dynamic.

I have tried using scala component to read it, but not able to get these values from the streamsets transformer component.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-04-14 02:36:59 -0500

updated 2020-04-14 02:49:50 -0500

Hi, I don't think that this is possible with Streamsets Transformer alone. At the end it is a Spark job which is running and there you have to drive the job with the files you want to process, there is no way to read a "control file" and then do a second "lookup" to get the file content for the data processing.

A few solutions I can think of (but have not yet implemented them):

  • use StreamSets Data Collector to read the "control file" and move the files from the original input folder to a folder Streamsets Transformer is listening on
  • use StreamSets Data Collector to read the "control file" and start the Transformer pipeline for the file over the REST API using the file name as the parameter
  • use a Job Scheduler (such as Airflow) and start Transformer pipeline for the file over the REST API using the file name as the parameter

Hope that helps. Guido

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-04-12 05:56:02 -0500

Seen: 36 times

Last updated: Apr 14