Pulling data from RDBMS

asked 2019-10-16

requirement : 1. hit API1 and get dataset list (along with credentials) 2. for each dataset credential, connect to DB and pull data and save it in HDFS.

answered 2019-10-16

1- use http client to request the API 2-stream selector to put the condition on each dataset 3- you can use the JDBC Query Executor to execute the query that you need 4-write the results to HDFS

  • you have to install required libraries either for hadoop or the database.
API returns the DB connection parameters. we need to use these parameters to connect to DB ... is this a viable architecture?

you can store API response in file in the resources folder and read them as parameter in DB node as parameters. ${databaseIP}:${databasePort} ...

actually we need to pass those connection parameters dynamically in the same pipeline to connect to database.For this what "processor'' in SDC i should use ?

you have to create two pipelines one to extract the configurations and write it to the /resources directory then another pipeline which will has parameters being red from the /resources directory and these parameter will be passed to JDBC processor and it will be updated automatically from that file

