How to add Scala code in Spark Evaluator/Executor and use in pipeline

asked 2019-10-25 13:22:45 -0500

RANNU gravatar image

updated 2019-10-25 14:30:56 -0500

metadaddy gravatar image

HI ,

How do i use scala to pull bulk data form source(DB)- can anyone share scala code to connect to source(DB) and pull the large volume of data. I have below scala code : not sure what needs to be changed to point it to my source DB

where do i input the below scala code in streamsets pipeline.

 val DFTable ="jdbc")
          .option("driver", driver)
          .option("url", url)
          .option("dbtable", srcDbSchema + "." + tableName)
          .option("user", srcDbUserName)
          .option("password", srcDbPassword)
          .option("fetchsize", 100000)

        val newdf = DFTable.transform(renameTablesAndColumns)
        newTableName = replaceCharacters(tableName).toString().toLowerCase()


        newdf.write.format("avro").option("compression", "snappy").mode("overwrite").save(rawPath + srcDbSchema.toString().toLowerCase() + "_" + newTableName)
        newdf.write.format("parquet").option("compression", "snappy").mode("overwrite").saveAsTable(destDbName + "." + srcDbSchema.toString().toLowerCase() + "_" + newTableName + "_" + currentDate)

This is time sensitive , any urgent help will be appreciated.

Thanks Rannu

edit retag flag offensive close merge delete


Why are you trying to do this from Scala rather than using the JDBC origin? It looks like you can probably do this with JDBC Query Consumer origin, Field Renamer and JDBC Producer destiination.

metadaddy gravatar imagemetadaddy ( 2019-10-25 14:33:23 -0500 )edit

Actually I have my origin as jdbc ... as my source have millions of tables with millions of data so to seed up the load process I was adviced to use scala program .

RANNU gravatar imageRANNU ( 2019-10-25 17:23:40 -0500 )edit