StreamSets Pipeline design

asked 2021-02-15 09:10:24 -0600

MuralidharKoti gravatar image

Hi All,

Need some suggestion with regards to how you have implemented "full data load" ingestion into data lake via streamsets. We would like to keep daily copies of the data source (say SharePoint list data) in destination (say Amazon S3 raw), but only take the latest load data onto reporting area (Amazon S3 refined area). We currently have one pipleline for source to raw and one for raw to refined.

The query I have is, how can I fetch the latest folder from Amazon S3 raw to configure the raw to refined pipeline?

Regards, Murali

edit retag flag offensive close merge delete