How do I base S3 directory path on Oracle CDC schema and table?

asked 2018-07-30 15:16:52 -0500

Mehul

updated 2018-07-31 18:56:01 -0500

metadaddy

Hello Everyone,

I compared AWS DMS with Streamsets Oracle CDC. I noticed that Streamsets Oracle fullload and CDC creates .csv file with sdc- as prefix but there is no way for us to identify table based on file name. In case of AWS DMS, we get schema bucket/<Table Bucket>/LOAD0001.csv file. The LOAD0001.csv file has records. Is there a way in Streamsets to get the same directory structure?

Thanks, Mehul

What destination are you using?

metadaddy ( 2018-07-30 16:42:40 -0500 )

AWS S3 bucket.

Mehul ( 2018-07-31 11:30:53 -0500 )

answered 2018-07-31 18:55:26 -0500

metadaddy

You can configure the Amazon S3 destination's Partition Prefix with an expression. The Oracle CDC origin puts the table name in the oracle.cdc.table attribute and the schema name in oracle.cdc.schema, so, to get a similar directory structure to AWS DWS, you would set the Partition Prefix to:


Set the Object Name Suffix to csv.

Where do I get a list of all origins like Salesforce, SQL Server etc ?

Mehul ( 2018-08-01 19:10:03 -0500 )

I found it myself. All I have to do is take a snapshot and then in preview window it will show record header. I managed to get details of Salesforce object as well and pipeline automatically created bucket in S3 for object.

Mehul gravatar imageMehul ( 2018-08-03 11:48:06 -0500 )edit
Asked: 2018-07-30 15:16:52 -0500

Seen: 746 times

Last updated: Jul 31 '18