Single CSV file creation - JDBC Origin

asked 2019-02-15 08:21:20 -0500

martinpella gravatar image

I have a Pipeline that extracts data from an Oracle Table and moves it into a Kafka Producer. Then another Pipeline that has a Kafka Consumer and a S3 destinator writes the data (in many files) into a particular bucket. My need is to create a single csv file that contains all data.

In order to achieve that I've written a .py script that does the job through Boto3 and Pandas.

I'm not being able to detect when all the data ingested into the Kafka Topic has been written into S3 (when the first pipeline dispatches the no-more-data event, not all data has already been written).

Any possible solution? Thanks.

edit retag flag offensive close merge delete