Ask Your Question

Kafka Origin in Yarn Streaming Mode

asked 2018-01-29 01:56:50 -0600

germaneduardo gravatar image

updated 2018-01-29 02:02:02 -0600

We are facing an issue with a pipeline configured to work on cluster yarn streaming mode. The origin of the pipeline is a Kafka consumer. We have SDC 2.6.1 running on CDH 5.7. We have configured from Cloudera manager (version 5.8) use the Kafka 0.10 on our enviroment.

We have tried with all the libraries available on the kafka consumer and the pipeline always fails with the following error: java.lang.NoSuchMethodError: kafka.message.MessageAndMetadata.

I almost sure that the issue can be related that the spark streaming application that streamsets is launching to run this pipeline is not using the library spark-streaming-kafka-0-10.

am I missing some configuration? can we configure in some way the Kafka stage in order to avoid the above error?

BTW: if we set the pipeline to Standalone, the pipeline works nicely, but our goal it is increase the parallelism. Also, We have configured on the file the variable SPARK_SUBMIT_YARN_COMMAND to use Spark 2.1

please any help is highly appreciated, thanks in advance!

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2018-02-17 14:33:57 -0600

germaneduardo gravatar image

updated 2018-02-17 14:34:35 -0600

In order to provide some feedback if someone else face the same issue: 1. we deleted the value of SPARK_SUBMIT_YARN_COMMAND on the file 2. restart streamsets service 3. streamsets use spark 1.6 to launch the spark streaming process

it seems that streamsets is not able to use spark 2.1 for the pipelines that use cluster yarn streaming

edit flag offensive delete link more
Login/Signup to Answer

Question Tools



Asked: 2018-01-29 01:56:50 -0600

Seen: 32 times

Last updated: 13 hours ago