Ask Your Question
0

Kafka Origin in Yarn Streaming Mode

asked 2018-01-29 01:56:50 -0500

germaneduardo gravatar image

updated 2018-01-29 02:02:02 -0500

We are facing an issue with a pipeline configured to work on cluster yarn streaming mode. The origin of the pipeline is a Kafka consumer. We have SDC 2.6.1 running on CDH 5.7. We have configured from Cloudera manager (version 5.8) use the Kafka 0.10 on our enviroment.

We have tried with all the libraries available on the kafka consumer and the pipeline always fails with the following error: java.lang.NoSuchMethodError: kafka.message.MessageAndMetadata.

I almost sure that the issue can be related that the spark streaming application that streamsets is launching to run this pipeline is not using the library spark-streaming-kafka-0-10.

am I missing some configuration? can we configure in some way the Kafka stage in order to avoid the above error?

BTW: if we set the pipeline to Standalone, the pipeline works nicely, but our goal it is increase the parallelism. Also, We have configured on the file sdc-env.sh the variable SPARK_SUBMIT_YARN_COMMAND to use Spark 2.1

please any help is highly appreciated, thanks in advance!

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2018-02-21 14:23:56 -0500

Aj262 gravatar image

I'm also trying to make the Streamsets work on Yarn Streaming Mode for Kafka client using SSL. Did the Yarn Streaming Mode work for you with Spark 1.6? Was it using Kafka client connection with SSL?

I tried with SSL using both Sparks 1.6 and 2.2 but getting the error below. Streamsets doesn't seem to have a Kafka stage library that supports Sparks 2. At least that appears from the stage library jar name being used in the classpath - file:$PWD/spark-streaming-kafka_2.10-1.6.0-cdh5.12.0.jar.

Error trying to invoke BootstrapClusterStreaming.main: org.apache.spark.SparkException: java.io.EOFException org.apache.spark.SparkException: java.io.EOFException

edit flag offensive delete link more

Comments

Hi, when we changed to use spark 1.6, we achieve to launch the pipeline in cluster mode. In the Kafka cluster we have not deployed SSL.

germaneduardo gravatar imagegermaneduardo ( 2018-02-22 08:06:07 -0500 )edit
0

answered 2018-02-17 14:33:57 -0500

germaneduardo gravatar image

updated 2018-02-17 14:34:35 -0500

In order to provide some feedback if someone else face the same issue: 1. we deleted the value of SPARK_SUBMIT_YARN_COMMAND on the file sdc-env.sh 2. restart streamsets service 3. streamsets use spark 1.6 to launch the spark streaming process

it seems that streamsets is not able to use spark 2.1 for the pipelines that use cluster yarn streaming

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

2 followers

Stats

Asked: 2018-01-29 01:56:50 -0500

Seen: 163 times

Last updated: Feb 17