Ask Your Question
1

Handling dataframe/dataset with Spark Evaluator

asked 2019-09-20 03:05:11 -0500

anonymous user

Anonymous

updated 2019-09-20 16:13:01 -0500

metadaddy gravatar image

We need to perform complex transformation in the spark evaluator.(ETL type). Our origin is Kafka MultiTopic consumer(Each topic is one table from oracle) . Doing so using JavaRDD seems impractical and we would like to use spark sql. And we are unable to transform JavaRDD<record> to dataframe/dataset and back to JavaRDD<record>.

Challenges:

  1. Handle Multitopic Kafka to perform join/transformation as spark evaluator will get data in batches and splitting into different dataframes/datasets.
  2. Converting JavaRDD<record> to datasets/dataframes and back to JavaRDD<record>

Any sample code or skeleton code would be appreciated in this regard.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2019-09-20 11:10:38 -0500

iamontheinet gravatar image

Hi!

Not sure if you already saw these tutorials so here they are:

On another note, it seems like you could really benefit from newly released product -- StreamSets Transformer. It has built-in capabilities, such as, joining datasets from multiple sources/origins and performing complex transformations like aggregations, sorting, ranking, etc. across the entire dataset. You can also extend its capabilities by writing custom Scala and PySpark code. For detailed documentation, click here.

Cheers, Dash

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-09-20 03:05:11 -0500

Seen: 70 times

Last updated: Sep 20