Ask Your Question
1

How do I base S3 directory path on Oracle CDC schema and table?

asked 2018-07-30 15:16:52 -0600

Mehul gravatar image

updated 2018-07-31 18:56:01 -0600

metadaddy gravatar image

Hello Everyone,

I compared AWS DMS with Streamsets Oracle CDC. I noticed that Streamsets Oracle fullload and CDC creates .csv file with sdc- as prefix but there is no way for us to identify table based on file name. In case of AWS DMS, we get schema bucket/<Table Bucket>/LOAD0001.csv file. The LOAD0001.csv file has records. Is there a way in Streamsets to get the same directory structure?

Thanks, Mehul

edit retag flag offensive close merge delete

Comments

What destination are you using?

metadaddy gravatar imagemetadaddy ( 2018-07-30 16:42:40 -0600 )edit

AWS S3 bucket.

Mehul gravatar imageMehul ( 2018-07-31 11:30:53 -0600 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-07-31 18:55:26 -0600

metadaddy gravatar image

You can configure the Amazon S3 destination's Partition Prefix with an expression. The Oracle CDC origin puts the table name in the oracle.cdc.table attribute and the schema name in oracle.cdc.schema, so, to get a similar directory structure to AWS DWS, you would set the Partition Prefix to:

${record:attribute('oracle.cdc.schema')}/${record:attribute('oracle.cdc.table')}

Set the Object Name Suffix to csv.

edit flag offensive delete link more

Comments

Where do I get a list of all origins like Salesforce, SQL Server etc ?

Mehul gravatar imageMehul ( 2018-08-01 19:10:03 -0600 )edit

I found it myself. All I have to do is take a snapshot and then in preview window it will show record header. I managed to get details of Salesforce object as well and pipeline automatically created bucket in S3 for object.

Mehul gravatar imageMehul ( 2018-08-03 11:48:06 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-07-30 15:16:52 -0600

Seen: 78 times

Last updated: Jul 31