Batch Job using pipeline

asked 2018-02-06 08:46:04 -0600

Vivian Yang gravatar image

One of my use case will be, running a batch job or system 1, 2, 3, 4.. Nth transaction data into a centralized database. In my use case, every system having their own transaction data. I would like to do a batch job that transfering all the data into a centralized database. There are several info that need to be captured.

  1. Fix a time for example 12.00am to run a job to get the latest system info and store in db.Number of System will increase. Hence will need to have a master data to store the latest number of system.
  2. All this system will having their own batch job and send the files into a shared folder to let streamsets to consume and save into database and update a process flag to 'Yes' Once completed.
  3. Once the shared folder is empty or finish processing, cross check with master list which we got on step A to do a cross check to ensure all the system has transfer the data out. If some of the system has process flag to "No", system shall send an notification email to inform some system is not working correct. Else send a email stating job success.

My end goal is to make sure that all system data can be centralized into one and if anything happen to one of the system (like no files coming in or file having issues in middle ) , streamsets can help to alert and notify the users on this.

I would need opinion or suggestion how i can achieve this automation using streamsets. I am stucked as i having issues when to start or to end a pipeline automatically and how i can trigger another pipeline.

Is it possible for me to check whether the shared folder is empty? Do you have any suggestion on how i can handle this use case using streamsets?

edit retag flag offensive close merge delete