Multiple data files at same time stamp

asked 2020-05-03 02:06:10 -0500

Sravya gravatar image

updated 2020-05-04 20:49:45 -0500

jeff gravatar image

I am getting data from Kafka and loading it to Local file, i have opted for option max records in a file as 1000, i haven't chosen the option idle timeout of file because file was not closing for very long time as records were coming one by one from Kafka.

Now the issue is, as i choose record limit per file, i am getting multiple records from Kafka at a time and multiple data files are getting created in Local file system with same time stamp, from Local file system through event files i am loading to table through script and now the event file are getting picked randomly, out of the 10 files of same timestamp, normally the oldest one should process, but now it is picking randomly 4th or 5th file,

We have a sql were the latest records will only get processed to table, in the above process we lose data.

edit retag flag offensive close merge delete

Comments

Can you clarify: what is the "script" you are referencing in the 2nd paragraph? It's another SDC pipeline, or actually some shell script you have written?

jeff gravatar imagejeff ( 2020-05-06 16:31:06 -0500 )edit