Ask Your Question
1

StreamSets with Oracle & Kafka

asked 2017-09-16 09:05:06 -0600

Amit Malhotra gravatar image

updated 2017-09-20 13:06:03 -0600

LC gravatar image

Hi All, First post to this group. we are trying to stream data from Oracle to Kafka. Data consists of intraday (event based) and end of day data (batch process). We need to convert data from a normalized data model to a denormalized one which can be then used for Customer Reporting/Analytics. There are two problem statements here (1) Data for a single transaction/record even can span 4-6 tables, this then needs to be denormalized. (2) There are lot of computations & Oracle decode functions which would need to be done on the data. Has someone in the group used StreamSets for a similar use case? Any suggestions or recommendations on the same?

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
1

answered 2017-09-18 20:30:28 -0600

jwood gravatar image

updated 2017-09-20 11:44:24 -0600

LC gravatar image

I've had a similar use case in the past, and ended up taking advantage of Advanced Queueing in Oracle. (Docs: https://docs.oracle.com/cd/B10501_01/...). A database trigger allowed us to produce a row into the Oracle queue, including a complex transform using other data from within the database. StreamSets can consume this queue using the JMS Consumer. From StreamSets, you can easily push the stream to multiple destinations (including Kafka).

That's a pretty custom solution, but I wanted to share it since it sounds like you might also have some pretty unique requirements.

There are potentially other ways to go about it -- StreamSets has a JDBC consumer for one or more tables, a JDBC lookup, and they support Oracle Change Detection.

I think your ideal StreamSets configuration will partially depend on the relative cost/efficiency of different operations. If lookups are cheap, you might "follow" one table and enrich rows using lookups in the other tables.

edit flag offensive delete link more
0

answered 2017-09-19 22:16:25 -0600

Amit Malhotra gravatar image

jwood - Thanks a lot for your answer. Couple of points (1) Are we saying that Streamsets is in the same league of products like Apache Camel or Mule is? Using this for routing and lightweight transformations from Point A to Point B?

(2) Will Streamsets able to give us the same level of processing power which an Oracle Strored Proc Can? Guess no as it we are trying to compare computing power of Oracle Vs JDBC

(3) From whatever little I have read, guess that real power comes in when one is able to embed a spark processing flow inbetween two end points , if this is indeed the case, then - is data denormalization best handled inside Oracle through stored procedures or can it be attempted to be done in Spark?

Please let me know your view on this, any feedback would be appreciated.

edit flag offensive delete link more

Comments

(1) StreamSets works well doing the types of transforms that Camel and Mule can do. I've also had success with much more complex scripting in StreamSets (Jython/Groovy).

jwood gravatar imagejwood ( 2017-09-20 19:01:44 -0600 )edit

(2) Processing power depends more on hardware than anything else. CPU+RAM=power. If you use StreamSets cluster mode, almost any YARN/Spark cluster will have a LOT more resources than Oracle installations.

jwood gravatar imagejwood ( 2017-09-20 19:03:06 -0600 )edit

(3) If you can join your tables and make big ugly records, StreamSets can clean them up. If lookups are cheap, you can follow your main table, and do lookups in the others to assemble the denormalized records. Personally, I try not to normalize data in the first place :)

jwood gravatar imagejwood ( 2017-09-20 19:05:53 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-09-16 09:05:06 -0600

Seen: 136 times

Last updated: Sep 20 '17