Ask Your Question

Ingest very large database tables over JDBC

asked 2017-11-14 08:52:49 -0500

Boris Tyukin gravatar image

updated 2017-11-16 01:36:39 -0500

metadaddy gravatar image

I've been playing with StreamSets and love it! Let's say I have 10 billion row table in Oracle DB which is not partitioned and has a single column primary key. Is there a way to load that table using multiple threads? I use sqoop today with 32 mappers which takes about 5 hours. Once initial load is done, I can load data incrementally using only one thread.

Is it possible to use StreamSets for that initial load or sqoop still the best option? I have a few dozens very large tables like that and using single threaded jdbc origin is not really an option..

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted

answered 2017-11-16 01:35:59 -0500

metadaddy gravatar image

The JDBC Multitable Consumer can run in a multithreaded mode and works similarly to Sqoop. In fact, we recently released a tool that allows you to migrate from Sqoop to StreamSets Data Collector very simply - see the blog entry How to Convert Apache Sqoop™ Commands Into StreamSets Data Collector Pipelines.

edit flag offensive delete link more


@metadaddy awesome, thanks! posted a follow up question on that blog post

Boris Tyukin gravatar imageBoris Tyukin ( 2017-12-02 21:02:53 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-11-14 08:51:33 -0500

Seen: 644 times

Last updated: Nov 16 '17