Ask Your Question

Why are my new files not being read by my SFTP origin and sent though the pipeline?

asked 2020-04-01 07:57:19 -0500

Tomi gravatar image

Pipeline details - Origin : SFTP/FTP/FTPS Client, Destination : Amazon s3

Background Details - I made a pipeline that transfer files from an SFTP source to an S3 folder. The origin is configured to move whole files based on a filename pattern. The pipeline is also configured to track the data that has already been processed by saving the offset.

Issue - The first file uploaded to SFTP will be picked up by the origin and processed though the pipeline. The new file that is uploaded overwrites the previous file because it has the same name but the last modified timestamp is different. This new file is not picked up and processed.

My questions are, is this behaviour expected? Does the origin rescan and look for any new files? Does it think that a file with that name has already been processed so it ignores it due to offset tracking?

After some testing I have found that having the new files be different names allows them to be picked up as you would normally expect.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted

answered 2020-05-13 23:57:38 -0500

uzumaki gravatar image

updated 2020-05-13 23:58:02 -0500

Had a similar issue. Archiving/Deleting the source files after transferring and manually resetting the job offset using a shell executor at the end of pipeline is what worked for me.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2020-04-01 07:57:19 -0500

Seen: 144 times

Last updated: May 13 '20