directory origin widget not picking up files
Hi, I have created a simple flow that starts with a directory origin. It is supposed to pickup the files in a directory and then publish the records in the file to a Kafka publisher step. A second workflow starts with a Kafka consumer origin which reads all the records from the queue and sends it to a hdfs widget which then writes the files to a hdfs directory.
I am able to start both the flows and they do work end to end as designed. However with the first flow, I have to stop and start it about 3-4 times before it is able to successfully pick up the files from the directory. Want to know if someone else has encountered this before.
What is the origin format, and how have you configured the origin (ex: max batch wait time, batch size, etc.)? When it does pick up the batches on the 3rd or 4th run, does it complete a "full" batch as per the size, or partial? What is the committed offset after each run (check in history)?
I have the following set, Batch size: 1000 records Batch wait time: 5 seconds Max files in directory: 1000 Buffer size: 128 kb Now it is not picking up anything.
What have you configured as the data format? And can you share a sample of the data from the input files?
If it configured as order by last modified timestamp? There are some issues resolved in the latest release, 3.2 That may be related if it is configured as last modified timestamp.(https://issues.streamsets.com/browse/SDC-8414) Otherwise, I would suggest modifying the log level to trace
and investigate the issue if you don't mind please share can you share your sdc.log here?