Ask Your Question

Make sure StreamSets processed file which been transfered completed?

asked 2017-09-21 05:18:20 -0500

Vivian Y gravatar image

updated 2017-09-21 15:54:13 -0500

metadaddy gravatar image

I have a simple pipeline in streamsets which reading from Local Directory (A) to Local Fs (B). Using FTP, i will transfer batches of files from my local machine to Local Directory (A).

Before complete transferring those files, this running pipeline already start to move files from (A) to (B) and cause Pipeline to be in Error Stage and Retrying.

Is there any method i can use to ensure that pipeline only process files which have been completely transferred?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2017-09-21 10:45:26 -0500

jeff gravatar image

If I understand you correctly, you want the Directory origin to detect when a file is "ready" (i.e. the FTP process that is placing it has completed). Unfortunately, that is not a simple problem to solve in general. How do you know that a file is finished transferring? Is it based on not changing for X number of seconds? If there is a network delay that could artificially trigger the condition.

The only reliable way to accomplish this is to have whatever process is initiating the FTP transfer perform a rename on the file after completing. Only the final (renamed) file name should be matched by the SDC Directory file pattern, and hence it won't pick up until the file is ready.

The other option is to use the File Tail origin (rather than Directory), and new records will simply be generated from the data as the file is populated. And the offset within that file will be tracked by SDC.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-09-21 05:18:20 -0500

Seen: 281 times

Last updated: Sep 21 '17