Ask Your Question
0

Pulling data from RDBMS

asked 2019-10-16 06:00:41 -0600

mspatil gravatar image

requirement : 1. hit API1 and get dataset list (along with credentials) 2. for each dataset credential, connect to DB and pull data and save it in HDFS.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-10-16 07:20:20 -0600

ahmed_alashrafy gravatar image

1- use http client to request the API 2-stream selector to put the condition on each dataset 3- you can use the JDBC Query Executor to execute the query that you need 4-write the results to HDFS

  • you have to install required libraries either for hadoop or the database.
edit flag offensive delete link more

Comments

API returns the DB connection parameters. we need to use these parameters to connect to DB ... is this a viable architecture?

mspatil gravatar imagemspatil ( 2019-10-16 07:34:24 -0600 )edit

you can store API response in file in the resources folder and read them as parameter in DB node as parameters. ${databaseIP}:${databasePort} ... https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Pipeline_Configuration/RuntimeValues.html

ahmed_alashrafy gravatar imageahmed_alashrafy ( 2019-10-20 02:13:22 -0600 )edit

actually we need to pass those connection parameters dynamically in the same pipeline to connect to database.For this what "processor'' in SDC i should use ?

mspatil gravatar imagemspatil ( 2019-10-22 04:20:41 -0600 )edit

you have to create two pipelines one to extract the configurations and write it to the /resources directory then another pipeline which will has parameters being red from the /resources directory and these parameter will be passed to JDBC processor and it will be updated automatically from that file

ahmed_alashrafy gravatar imageahmed_alashrafy ( 2019-10-24 07:02:11 -0600 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-10-16 06:00:41 -0600

Seen: 121 times

Last updated: Oct 16