We are facing an issue with a pipeline configured to work on cluster yarn streaming mode. The origin of the pipeline is a Kafka consumer. We have SDC 2.6.1 running on CDH 5.7. We have configured from Cloudera manager (version 5.8) use the Kafka 0.10 on our enviroment.

We have tried with all the libraries available on the kafka consumer and the pipeline always fails with the following error: java.lang.NoSuchMethodError: kafka.message.MessageAndMetadata.

I almost sure that the issue can be related that the spark streaming application that streamsets is launching to run this pipeline is not using the library spark-streaming-kafka-0-10.

am I missing some configuration? can we configure in some way the Kafka stage in order to avoid the above error?

BTW: if we set the pipeline to Standalone, the pipeline works nicely, but our goal it is increase the parallelism. Also, We have configured on the file the variable SPARK_SUBMIT_YARN_COMMAND to use Spark 2.1

please any help is highly appreciated, thanks in advance!

In order to provide some feedback if someone else face the same issue: 1. we deleted the value of SPARK_SUBMIT_YARN_COMMAND on the file 2. restart streamsets service 3. streamsets use spark 1.6 to launch the spark streaming process

it seems that streamsets is not able to use spark 2.1 for the pipelines that use cluster yarn streaming

