Ask Your Question
0

Is the directory origin able to read new files in a directory (files are written in such directory in a regular time interval) or it reads all files present in the directory every times?

asked 2018-09-17 05:18:06 -0500

carlo.fisicaro gravatar image

E.g: le me suppose we have 10 files and the origin reads them based on the LastModified Timestamp, after a minute a file will be added to the specified directory; Does StreamSets import only the last added file or all the files within the directory (11 in our case)?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2018-09-17 06:27:21 -0500

Maithri gravatar image

updated 2018-09-17 07:15:28 -0500

Yes it is possible to read new files that are written in the specified directory .For your example in the first pipeline run it fetches the existing 10 files.Then when you add a new file it fetches only the newly added file based on the timestamp.

There were 23 files in the directory initially:

image description

when new file is being added to the same directory while the pipeline is still running count increases by 1 that is 23 to 24:

image description

After the pipeline is stopped when added a new file to the directory and the pipeline is ran for second time, count is 1.This means in the second run it fetches only the new files.

image description

edit flag offensive delete link more

Comments

Ok it works, unfortunately when pipeline runs I get the WARN: "File cannot be added to the queue:<file_name>; DirectorySpooler; directory-dirspooler-pool-20356-thread-1" and the INFO "sending no-more-data event. records 202619 errors 0 files 1" and then the INFO:"sending no-more-data event.

carlo.fisicaro gravatar imagecarlo.fisicaro ( 2018-09-17 07:29:25 -0500 )edit

I have already read the doc but unfortunately I don't find a solution for this problem.

carlo.fisicaro gravatar imagecarlo.fisicaro ( 2018-09-17 07:48:12 -0500 )edit

It's likely a file ordering problem. For example, if you are using lexicographical ordering, and you've already processed 005.csv, the pipeline will not process 004.csv, even if it is newer. Same for last modified ordering if you drop in an older file than the last processed.

metadaddy gravatar imagemetadaddy ( 2018-09-17 10:44:50 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-09-17 05:18:06 -0500

Seen: 47 times

Last updated: Sep 17