directory origin widget not picking up files

asked 2018-01-31 00:07:25 -0600

danoomistmatiste gravatar image

Hi, I have created a simple flow that starts with a directory origin. It is supposed to pickup the files in a directory and then publish the records in the file to a Kafka publisher step. A second workflow starts with a Kafka consumer origin which reads all the records from the queue and sends it to a hdfs widget which then writes the files to a hdfs directory.

I am able to start both the flows and they do work end to end as designed. However with the first flow, I have to stop and start it about 3-4 times before it is able to successfully pick up the files from the directory. Want to know if someone else has encountered this before.

edit retag flag offensive close merge delete


What is the origin format, and how have you configured the origin (ex: max batch wait time, batch size, etc.)? When it does pick up the batches on the 3rd or 4th run, does it complete a "full" batch as per the size, or partial? What is the committed offset after each run (check in history)?

jeff gravatar imagejeff ( 2018-01-31 10:41:23 -0600 )edit

I have the following set, Batch size: 1000 records Batch wait time: 5 seconds Max files in directory: 1000 Buffer size: 128 kb Now it is not picking up anything.

danoomistmatiste gravatar imagedanoomistmatiste ( 2018-01-31 12:39:32 -0600 )edit

What have you configured as the data format? And can you share a sample of the data from the input files?

jeff gravatar imagejeff ( 2018-02-05 16:26:42 -0600 )edit

If it configured as order by last modified timestamp? There are some issues resolved in the latest release, 3.2 That may be related if it is configured as last modified timestamp.( Otherwise, I would suggest modifying the log level to trace

Jisun gravatar imageJisun ( 2018-02-12 01:12:42 -0600 )edit

and investigate the issue if you don't mind please share can you share your sdc.log here?

Jisun gravatar imageJisun ( 2018-02-12 01:13:30 -0600 )edit